reviewarticle hierarchical ensemble methods for protein ... · reviewarticle hierarchical ensemble...

35
Review Article Hierarchical Ensemble Methods for Protein Function Prediction Giorgio Valentini AnacletoLab-Dipartimento di Informatica, Universit` a degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy Correspondence should be addressed to Giorgio Valentini; [email protected] Received 2 February 2014; Accepted 25 February 2014; Published 5 May 2014 Academic Editors: T. Can and T. R. Hvidsten Copyright © 2014 Giorgio Valentini. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. e main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. 1. Introduction Exploiting the wealth of biomolecular data accumulated by novel high-throughput biotechnologies, “in silico” protein function prediction can generate hypotheses to drive the bio- logical discovery and validation of protein functions [1]. Indeed, “in vitro” methods are costly in time and money, and automatic prediction methods can support the biologist in understanding the role of a protein or of a biological process or in annotating a new genome at high level of accuracy or more, in general, in solving problems of functional genomics [2]. e Automated Function Prediction (AFP) is a multiclass, multilabel classification problem characterized by hundreds or thousands of functional classes structured according to a predefined hierarchy. Even in principle, also unsupervised methods can be applied to AFP, due to the inherent difficulty of extracting functional classes without exploiting any avail- able a priori information [3, 4]; usually supervised or semi- supervised learning methods are applied in order to exploit the available a priori information about gene annotations. From a computational standpoint, AFP is a challenging problem for several reasons. (i) e number of functional classes is usually large: hundreds for the Functional Catalogue (FunCat) [5] or thousands for the Gene Ontology (GO) [6]. (ii) Proteins may be annotated for multiple functional classes: since each protein may belong to more than one class at the same time, the classification problem is multilabel. (iii) Multiple sources of data are available for each protein: high-throughput biotechnologies make an increas- ing number of sources of genomic and proteomic data available. Hence, in order to exploit all the information available for each protein, we need to learn methods that are able to integrate different data sources [7]. (iv) Functional classes are hierarchically related: annota- tions are not independent because functional classes are hierarchically organized; in general, known func- tional relationships (such as taxonomies) can be exploited to incorporate a priori knowledge in learn- ing algorithms or to introduce explicit constraints between labels. Hindawi Publishing Corporation ISRN Bioinformatics Volume 2014, Article ID 901419, 34 pages http://dx.doi.org/10.1155/2014/901419

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

Review ArticleHierarchical Ensemble Methods for Protein Function Prediction

Giorgio Valentini

AnacletoLab-Dipartimento di Informatica Universita degli Studi di Milano Via Comelico 39 20135 Milano Italy

Correspondence should be addressed to Giorgio Valentini valentinidiunimiit

Received 2 February 2014 Accepted 25 February 2014 Published 5 May 2014

Academic Editors T Can and T R Hvidsten

Copyright copy 2014 Giorgio Valentini This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Protein function prediction is a complex multiclass multilabel classification problem characterized by multiple issues such asthe incompleteness of the available annotations the integration of multiple sources of high dimensional biomolecular datathe unbalance of several functional classes and the difficulty of univocally determining negative examples Moreover thehierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivatethe development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unawareldquoflatrdquo predictionmethods In this paper we provide a comprehensive review of hierarchical methods for protein function predictionbased on ensembles of learning machines According to this general approach a separate learning machine is trained to learn aspecific functional term and then the resulting predictions are assembled in a ldquoconsensusrdquo ensemble decision taking into accountthe hierarchical relationships between classes The main hierarchical ensemble methods proposed in the literature are discussedin the context of existing computational methods for protein function prediction highlighting their characteristics advantagesand limitations Open problems of this exciting research area of computational biology are finally considered outlining novelperspectives for future research

1 Introduction

Exploiting the wealth of biomolecular data accumulated bynovel high-throughput biotechnologies ldquoin silicordquo proteinfunction prediction can generate hypotheses to drive the bio-logical discovery and validation of protein functions [1]Indeed ldquoin vitrordquo methods are costly in time and money andautomatic prediction methods can support the biologist inunderstanding the role of a protein or of a biological processor in annotating a new genome at high level of accuracy ormore in general in solving problems of functional genomics[2]

The Automated Function Prediction (AFP) is a multiclassmultilabel classification problem characterized by hundredsor thousands of functional classes structured according toa predefined hierarchy Even in principle also unsupervisedmethods can be applied to AFP due to the inherent difficultyof extracting functional classes without exploiting any avail-able a priori information [3 4] usually supervised or semi-supervised learning methods are applied in order to exploitthe available a priori information about gene annotations

From a computational standpoint AFP is a challengingproblem for several reasons

(i) The number of functional classes is usually largehundreds for the Functional Catalogue (FunCat) [5]or thousands for the Gene Ontology (GO) [6]

(ii) Proteins may be annotated for multiple functionalclasses since each protein may belong to more thanone class at the same time the classification problemis multilabel

(iii) Multiple sources of data are available for each proteinhigh-throughput biotechnologies make an increas-ing number of sources of genomic and proteomicdata available Hence in order to exploit all theinformation available for each protein we need tolearn methods that are able to integrate different datasources [7]

(iv) Functional classes are hierarchically related annota-tions are not independent because functional classesare hierarchically organized in general known func-tional relationships (such as taxonomies) can beexploited to incorporate a priori knowledge in learn-ing algorithms or to introduce explicit constraintsbetween labels

Hindawi Publishing CorporationISRN BioinformaticsVolume 2014 Article ID 901419 34 pageshttpdxdoiorg1011552014901419

2 ISRN Bioinformatics

(v) Small number of annotations for each class typicallyfunctional classes are severely unbalanced with asmall number of available ldquopositiverdquo annotations

(vi) Multiple possible definitions of negative examplessince we only have positive annotations (the totalnumber of GO negative annotations is about 2500considering all species (August 2013)) the notion ofnegative example is not uniquely determined anddifferent strategies of choosing negative examples canbe applied in principle [8]

(vii) Different reliability of functional labels functionalannotations have different degrees of evidence thatis each label is assigned to a gene with a specific levelof reliability

(viii) Complex and noisy data data are usually complex(eg high-dimensional large-scale and graph-stru-ctured) and noisy

Most of the computational methods for AFP have beenapplied to unicellular organisms (eg S cerevisiae) [9ndash11] butrecently several approaches have been applied tomulticellularorganisms (such as M musculus or the A thaliana plantmodel organisms [2 12ndash16])

Several computational approaches and in particularmachine learning methods have been proposed to deal withthe above issues ranging from sequence-based methods [17]to network-based methods [18] structured output algorithmbased on kernels [19] and hierarchical ensemble methods[20]

Other approaches focused primarily on the integrationof multiple sources of data since each type of genomic datacaptures only some aspects of the genes to be classified anda specific source can be useful to learn a specific functionalclass while being irrelevant to others In the literature manyapproaches have been proposed to deal with this topicfor example functional linkage networks integration [21]kernel fusion [11] vector space integration [22] and ensemblesystems [23]

Extensive experimental studies showed that flat predic-tion that is predictions for each class made independentlyof the other classes introduces significant inconsistenciesin the classification due to the violation of the true pathrule that governs the functional annotations of genes bothin the GO and in FunCat taxonomies [24] According tothis rule positive predictions for a given term must betransferred to its ldquoancestorrdquo terms and negative predictionsto its descendants (see Appendix A and Section 7 for moredetails about the GO and the true path rule) Moreover flatpredictions are difficult to interpret because they may beinconsistent with one another A method that claims forexample that a protein has homodimerization activity butdoes not have dimerization activity is clearly incorrect and abiologist attempting to interpret these results would not likelytrust either prediction [24]

It is worth noting that the results of the CriticalAssessment of Functional Annotation (CAFA) challenge arecent comprehensive critical assessment and comparisonof different computational methods for AFP [16] showed

that AFP is characterized by multiple complex issues andone of the best performing CAFA methods corrected flatpredictions taking into account the hierarchical relationshipsbetween functional terms with an approach similar to thatadopted by hierarchical ensemblemethods [25] Indeed hier-archical ensemble methods embed in the learning processthe relationships between functional classes Usually thisis performed in a second ldquoreconciliationrdquo step where thepredictions are modified to make them consistent with theontology [26ndash29]More in general thesemethods exploit therelationships between ontology terms structured accordingto a forest of trees [5] or a directed acyclic graph [6] tosignificantly improve prediction performances with respectto ldquoflatrdquo prediction methods [30ndash32]

Hierarchical classification and in particular ensemblemethods for hierarchical classification have been applied inseveral domains different from protein function predictionranging from text categorization [33ndash35] to music genreclassification [36ndash38] hierarchical image classification [3940] and video annotation [41] and automatic classificationof worldwide web documents [42 43] The present reviewfocuses on hierarchical ensemble methods for AFP For amore general review on hierarchical classification methodsand their applications in different domains see [44]

The paper is structured as follows In Section 2 weprovide a synthetic picture of the main categories of proteinfunctionmethods to properly position hierarchical ensemblemethods in the context of computationalmethods for AFP InSection 3 the main common characteristics of hierarchicalensemble algorithms as well as a general taxonomy ofthese methods are proposed The following five sectionsfocus on the main families of hierarchical methods for AFPand discuss their main characteristics Section 4 introduceshierarchical top-downmethods Section 5 Bayesian ensembleapproaches Section 6 reconciliation methods Section 7 truepath rule ensemble methods and the last one (Section 8)ensembles based on decision trees Section 9 critically dis-cusses the main issues and limitations of hierarchical ensem-ble methods and shows that this approach such as the othercurrent approaches for AFP cannot be successfully appliedwithout considering the large set of complex learning issuesthat characterize the AFP problem The last two sectionsdiscuss the open problems and future possible researchlines in the context of hierarchical ensemble methods andsummarize the main findings in this exciting research areaIn the Appendix some basic information about the FunCatand the GO that is the two main hierarchical ontologies thatare widely used to annotate proteins in all organisms areprovided as well as the characteristics of the hierarchical-aware performance measures proposed in the literature toassess the accuracy and the reliability of the predictionsmadeby hierarchical computational methods

2 A Taxonomy of Protein FunctionPrediction Methods

Several computational methods for the AFP problem havebeen proposed in the literature Some methods providedpredictions of a relatively small set of functional classes

ISRN Bioinformatics 3

[11 45 46] while others considered predictions extended tolarger sets using support vector machines and semidefiniteprogramming [11] artificial neural networks [47] functionallinkage networks [21 48] Bayesian networks [45] or meth-ods that combine functional linkage networks with learningmachines using a logistic regression model [12] or simplealgebraic operators [13]

Other research lines for AFP explicitly take into accountthe hierarchical nature of the multilabel classification prob-lem For instance structured outputmethods are based on thejoint kernelization of both input variables and output labelsusing for example perceptron-like learning algorithms [49]or maximum-margin algorithms [50] Other approachesimprove the prediction of GO annotations by extractingimplicit semantic relationships between genes and functions[51] Finally other methods adopted an ensemble approach[52] to take advantage of the intrinsic hierarchical natureof protein function prediction explicitly considering therelationships between functional classes [24 53ndash55]

Computational methods for AFP mostly based onmachine learning methods can be schematically grouped inthe following four families

(1) sequence-based methods(2) network-based methods(3) kernel methods for structured output spaces(4) hierarchical ensemble methods

This grouping is neither exhaustive nor strict meaning thatcertain methods do not belong to any of these groups andothers belong to more than one

21 Sequence-Based Methods Algorithms based on align-ment of sequences represent the first attempts to compu-tationally predict the function of proteins [56 57] similarsequences are likely to share common functions even if itis well known that secondary and tertiary structure con-servation are usually more strictly related to protein func-tions However algorithms able to infer similarities betweensequences are today standardmethods of assigning functionsto proteins in newly sequenced organisms [17 58] Of courseglobal or local structure comparison algorithms betweenproteins can be applied to detect functional properties [59]and in this context the integration of different sequenceand structure-based prediction methods represents a majorchallenge [60]

Even if most of the research efforts for the design anddevelopment of AFP methods concentrated on machinelearning methods it is worth noting that in the AFP 2011challenge [16] one of the best performing methods is repre-sented by a sequence-based algorithm [61] Indeed when theonly information available is represented by a raw sequence ofamino acids or nucleotides sequence-based methods can becompetitive with state-of-the-art machine learning methodsby exploiting homology-based inference [62]

22 Network-Based Methods These methods usually repre-sent each dataset through an undirected graph 119866 = (119881 119864)

where nodes V isin 119881 correspond to genegene products andedges 119890 isin 119864 are weighted according to the evidence ofcofunctionality implied by data source [63 64] These algo-rithms are able to transfer annotations from previouslyannotated (labeled) nodes to unannotated (unlabeled) onesby exploiting ldquoproximity relationshipsrdquo between connectednodes Basically these methods are based on transductivelabel propagation algorithms that predict the labels of unan-notated examples without using a global predictive model[14 21 45] Several method exploited the semantic similaritybetween GO terms [65 66] to derive functional similaritymeasures between genes to construct functional terms usingthen supervised or semisupervised learning algorithm toinfer GO annotations of genes [67ndash70]

Different strategies to learn the unlabeled nodes havebeen explored by ldquolabel propagationrdquo algorithms that ismethods able to ldquopropagaterdquo the labels of annotated proteinsacross the networks by exploiting the topology of the under-lying graph For instance methods based on the evaluationof the functional flow in graphs [64 71] methods based onthe Hopfield networks [48 72 73] methods based on theMarkov [74 75] andGaussian randomfields [14 46] and alsosimple ldquoguilt-by-associationrdquo methods [76 77] based on theassumption that connected nodesproteins in the functionalnetworks are likely to share the same functions Recentlymethods based on kernelized score functions able to exploitboth local and global semisupervised learning strategies havebeen successfully applied to AFP [78] as well as to diseasegene prioritization [79] and drug repositioning problems[80 81]

Reference [82] showed that different graph-based algo-rithms can be cast into a common framework where aquadratic cost objective function isminimized In this frame-work closed form solutions can be derived by solving a linearsystem of size equal to the cardinality of nodes (proteins) orusing fast iterative procedures such as the Jacobimethod [83]A network-based approach alternative to label propagationand exhibiting strong theoretical predictive guarantees in theso-called mistake bound model has been recently proposedby [84]

23 Kernel Methods for Structured Output Spaces By extend-ing kernels to the output space the multilabel hierarchicalclassification problem is solved globally the multilabels areviewed as elements of a structured space modeled by suitablekernel functions [85ndash87] and structured predictions areviewed as a maximum a posteriori prediction problem [88]

Given a feature space X and a space of structured labelsY the task is to learn a mapping 119891 X rarr Y by an inducedjoint kernel function 119896 that computes the ldquocompatibilityrdquo ofa given input-output pair (119909 119910) for each test example 119909 isin

X we need to determine the label 119910 isin Y such that 119910 =

argmax119910isinY119896(119909 119910) for any 119909 isin X By modeling probabilitiesby a log-linear model and using a suitable feature map120601(119909 119910) we can define an induced joint kernel function thatuses both inputs and outputs to compute the ldquocompatibilityrdquoof a given input-output pair [88]

119896 (X timesY) times (X timesY) 997888rarr R (1)

4 ISRN Bioinformatics

Structured output methods infer a label 119910 by finding themaximum of a function 119892 that uses the previously definedjoint kernel (1)

119910 = argmax119910isinY

119892 (119909 119910) (2)

The GOstruct system implemented a structured percep-tron and a variant of the structured support vector machine[85] This approach has been successfully applied to theprediction of GO terms inmouse and other model organisms[19] Structured output maximum-margin algorithms havebeen also applied to the tree-structured prediction of enzymefunctions [50 86]

24 Hierarchical Ensemble Methods Other approaches takeexplicitly into account the hierarchical relationships betweenfunctional terms [26 29 53 54 89 90] Usually they modifythe ldquoflatrdquo predictions (ie predictions made independentlyof the hierarchical structure of the classes) and correctthem improving accuracy and consistency of the multilabelannotations of proteins [24]

The flat approach makes predictions for each term inde-pendently and consequently the predictor may assign to asingle protein a set of terms that are inconsistent with oneanother A possible solution for this problem is to train aclassifier for each term of the reference ontology to producea set of prediction at each term and finally to reconcilethe predictions by taking into account the relationshipsbetween the classes of the ontology Different ensemblebased algorithms have been proposed ranging frommethodsrestricted to multilabels with single and no partial paths [91]to methods extended to multiple and also partial paths [92]Many recent published works clearly demonstrated that thisapproach ensures an increment in precision but this comesat expenses of the overall recall [2 30]

In the next section we discuss in detail hierarchicalensemble methods since they constitute the main topic ofthis review

3 Hierarchical Ensemble MethodsExploiting the Hierarchy toImprove Protein Function Prediction

Ensemble methods are one of the main research areas ofmachine learning [52 93ndash95] From a general standpointensembles of classifiers are sets of learning machines thatwork together to solve a classification problem (Figure 1)Empirical studies showed that in both classification andregression problems ensembles improve on single learningmachines and moreover large experimental studies com-pared the effectiveness of different ensemble methods onbenchmark data sets [96ndash99] and they have been successfullyapplied to several computational biology problems [100ndash104]Ensemble methods have been also successfully applied in anunsupervised setting [105 106] Several theories have beenproposed to explain the characteristics and the successfulapplication of ensembles to different application domainsFor instance Allwein Schapire and Singer interpreted the

Trainingdata

Combiner Final prediction

Base class 1

Base class L

Figure 1 Ensemble of classifiers

improved generalization capabilities of ensembles of learningmachines in the framework of large margin classifiers [107108] Kleinberg in the context of Stochastic DiscriminationTheory [109] and Breiman and Friedman in the light ofthe bias-variance analysis borrowed from classical statistics[110 111]The interest in this research area ismotivated also bythe availability of very fast computers and networks of work-stations at a relatively low cost that allow the implementationand the experimentation of complex ensemblemethods usingoff-the-shelf computer platforms

Constraints between labels and more in general theissue of label dependence have been recognized to play acentral role in multilabel learning [112] Protein functionprediction can be regarded as a paradigmatic multilabelclassification problem where the exploitation of a prioriknowledge about the hierarchical relationships between thelabels can dramatically improve classification performance[24 27 113]

In the context ofAFPproblems ensemblemethods reflectthe hierarchy of functional terms in the structure of theensemble itself each base learner is associated with a nodeof the graph representing the functional hierarchy and learnsa specific GO term or FunCat category The predictionsprovided by the trained classifiers are then combined byexploiting the hierarchical relationships of the taxonomy

In their more general form hierarchical ensemble meth-ods adopt a two-step learning strategy

(1) In the first step each base learner separately or inter-acting with connected base learners learns the proteinfunctional category on a per term basis Inmost casesthis yields a set of independent classification prob-lems where each base learning machine is trained tolearn a specific functional term independently of theother base learners

(2) In the second step the predictions provided by thetrained classifiers are combined by considering thehierarchical relationships between the base classifiersmodeled according to the hierarchy of the functionalclasses

Figure 2 depicts the two learning steps of hierarchicalensemble methods In the first step a learning algorithm(a square object in Figure 2(a)) is applied to train thebase classifiers associated with each class (represented withnumbers from 1 to 9) Then the resulting base classifiers

ISRN Bioinformatics 5

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Data

(a)

0

1 2

3 4 5 6 7

8 9

(b)

Figure 2 Schematic representation of the two main learning steps of hierarchical ensemble methods (a) Training of base classifiers (b)top-down andor bottom-up propagation of the predictions

(circles) in the prediction phase exploit the hierarchicalrelationships between classes to combine its predictions withthose provided by the other base classifiers (Figure 2(b))Note that the dummy 0 node is added to obtain a rootedhierarchy Up and down arrows represent the possibility ofcombining predictions by exploiting those provided respec-tively by children and parents classifiers according to abottom-up or top-down learning strategy Note that bothldquolocalrdquo combinations are possible (eg the prediction ofnode 5 may depend only on the prediction of node 1) butalso ldquoglobalrdquo combinations can be considered by taking intoaccount the predictions across the overall structure of thegraph (eg predictions for node 9 can depend on all thepredictions made by all the other base classifiers from 1 to8) Moreover both top-down propagation of the predictions(down arrows Figure 2(b)) and bottom-up propagation (uparrows) can be considered depending on the specific designof the hierarchical ensemble algorithm

This ensemble approach is highly modular in principleany learning algorithm can be used to train the classifiers inthe first step and both annotation decisions probabilitiesor whatever scores provided by each base learner can becombined depending on the characteristics of the specifichierarchical ensemble method

In this section we provide some basic notations andan ensemble taxonomy that will be used to introduce thedifferent hierarchical ensemble methods for AFP

31 Basic Notation A genegene product 119892 can be repre-sented through a vector x isin R119889 having 119889 different features(eg gene expression levels across 119889 different conditionssequence similarities with other genesproteins or presenceor absence of a given domain in the corresponding proteinor genetic or physical interaction with other proteins) Notethat we for the sake of simplicity and with a certain approxi-mation refer in the same way to genes and proteins even if itis well known that a given gene may correspond to multipleproteins A gene 119892 is assigned to one or more functionalclasses in the set 119862 = 1198881 1198882 119888119898 structured according toa FunCat forest of trees 119879 or a directed acyclic graph 119866 ofthe Gene Ontology (usually a dummy root class 1198880 which

every gene belongs to is added to 119879 or 119866 to facilitate theprocessing) The assignments are coded through a vector ofmultilabels y = (1199101 1199102 119910119898) isin 0 1

119898 where 119892 belongs toclass 119888119894 if and only if 119910119894 = 1

In both theGeneOntology(GO) and FunCat taxonomiesthe functional classes are structured according to a hierarchyand can be represented by a directed graph where nodescorrespond to classes and edges correspond to relationshipsbetween classes Hence the node corresponding to the class119888119894 can be simply denoted by 119894 We represent the set of childrennodes of 119894 by child(119894) and the set of its parents by par(119894)Moreover 119910child(119894) denotes the labels of the children classesof node 119894 and analogously 119910par(119894) denotes the labels of theparent classes of 119894 Note that in FunCat only one parent ispermitted since the overall hierarchy is a tree forest while inthe GO more parents are allowed because the relationshipsare structured according to a directed acyclic graph

Hierarchical ensemble methods train a set of calibratedclassifiers one for each node of the taxonomy 119879 These clas-sifiers are used to derive estimates 119901119894(119892) of the probabilities119901119894(119892) = P(119881119894 = 1 | 119881par(119894) = 1 119892) for all 119892 and 119894 where(1198811 119881119898) isin 0 1

119898 is the vector random variablemodelingthe unknown multilabel of a gene 119892 and 119881par(119894) denotes therandom variables associated with the parents of node 119894 Notethat 119901119894(119892) are probabilities conditioned to 119881par(119894) = 1 thatis the probability that a gene is annotated to a given term119894 given that the gene is just annotated to its parent termsthus respecting the true path rule Ensemble methods infera multilabel assignment y = (1199101 119910119898) isin 0 1

119898 based onestimates 1199011(119892) 119901119898(119892)

32 A Taxonomy of Hierarchical Ensemble Methods Hierar-chical ensemble methods for AFP share several character-istics from the two-step learning approach to the exploita-tion of the hierarchical relationships between classes Forthese reasons it is quite difficult to clearly and univocallyindividuate taxonomy of hierarchical ensemble methodsHere we show taxonomy useful mainly to describe anddiscuss existing methods for AFP For a recent review andtaxonomy of hierarchical ensemble methods not specific for

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

2 ISRN Bioinformatics

(v) Small number of annotations for each class typicallyfunctional classes are severely unbalanced with asmall number of available ldquopositiverdquo annotations

(vi) Multiple possible definitions of negative examplessince we only have positive annotations (the totalnumber of GO negative annotations is about 2500considering all species (August 2013)) the notion ofnegative example is not uniquely determined anddifferent strategies of choosing negative examples canbe applied in principle [8]

(vii) Different reliability of functional labels functionalannotations have different degrees of evidence thatis each label is assigned to a gene with a specific levelof reliability

(viii) Complex and noisy data data are usually complex(eg high-dimensional large-scale and graph-stru-ctured) and noisy

Most of the computational methods for AFP have beenapplied to unicellular organisms (eg S cerevisiae) [9ndash11] butrecently several approaches have been applied tomulticellularorganisms (such as M musculus or the A thaliana plantmodel organisms [2 12ndash16])

Several computational approaches and in particularmachine learning methods have been proposed to deal withthe above issues ranging from sequence-based methods [17]to network-based methods [18] structured output algorithmbased on kernels [19] and hierarchical ensemble methods[20]

Other approaches focused primarily on the integrationof multiple sources of data since each type of genomic datacaptures only some aspects of the genes to be classified anda specific source can be useful to learn a specific functionalclass while being irrelevant to others In the literature manyapproaches have been proposed to deal with this topicfor example functional linkage networks integration [21]kernel fusion [11] vector space integration [22] and ensemblesystems [23]

Extensive experimental studies showed that flat predic-tion that is predictions for each class made independentlyof the other classes introduces significant inconsistenciesin the classification due to the violation of the true pathrule that governs the functional annotations of genes bothin the GO and in FunCat taxonomies [24] According tothis rule positive predictions for a given term must betransferred to its ldquoancestorrdquo terms and negative predictionsto its descendants (see Appendix A and Section 7 for moredetails about the GO and the true path rule) Moreover flatpredictions are difficult to interpret because they may beinconsistent with one another A method that claims forexample that a protein has homodimerization activity butdoes not have dimerization activity is clearly incorrect and abiologist attempting to interpret these results would not likelytrust either prediction [24]

It is worth noting that the results of the CriticalAssessment of Functional Annotation (CAFA) challenge arecent comprehensive critical assessment and comparisonof different computational methods for AFP [16] showed

that AFP is characterized by multiple complex issues andone of the best performing CAFA methods corrected flatpredictions taking into account the hierarchical relationshipsbetween functional terms with an approach similar to thatadopted by hierarchical ensemblemethods [25] Indeed hier-archical ensemble methods embed in the learning processthe relationships between functional classes Usually thisis performed in a second ldquoreconciliationrdquo step where thepredictions are modified to make them consistent with theontology [26ndash29]More in general thesemethods exploit therelationships between ontology terms structured accordingto a forest of trees [5] or a directed acyclic graph [6] tosignificantly improve prediction performances with respectto ldquoflatrdquo prediction methods [30ndash32]

Hierarchical classification and in particular ensemblemethods for hierarchical classification have been applied inseveral domains different from protein function predictionranging from text categorization [33ndash35] to music genreclassification [36ndash38] hierarchical image classification [3940] and video annotation [41] and automatic classificationof worldwide web documents [42 43] The present reviewfocuses on hierarchical ensemble methods for AFP For amore general review on hierarchical classification methodsand their applications in different domains see [44]

The paper is structured as follows In Section 2 weprovide a synthetic picture of the main categories of proteinfunctionmethods to properly position hierarchical ensemblemethods in the context of computationalmethods for AFP InSection 3 the main common characteristics of hierarchicalensemble algorithms as well as a general taxonomy ofthese methods are proposed The following five sectionsfocus on the main families of hierarchical methods for AFPand discuss their main characteristics Section 4 introduceshierarchical top-downmethods Section 5 Bayesian ensembleapproaches Section 6 reconciliation methods Section 7 truepath rule ensemble methods and the last one (Section 8)ensembles based on decision trees Section 9 critically dis-cusses the main issues and limitations of hierarchical ensem-ble methods and shows that this approach such as the othercurrent approaches for AFP cannot be successfully appliedwithout considering the large set of complex learning issuesthat characterize the AFP problem The last two sectionsdiscuss the open problems and future possible researchlines in the context of hierarchical ensemble methods andsummarize the main findings in this exciting research areaIn the Appendix some basic information about the FunCatand the GO that is the two main hierarchical ontologies thatare widely used to annotate proteins in all organisms areprovided as well as the characteristics of the hierarchical-aware performance measures proposed in the literature toassess the accuracy and the reliability of the predictionsmadeby hierarchical computational methods

2 A Taxonomy of Protein FunctionPrediction Methods

Several computational methods for the AFP problem havebeen proposed in the literature Some methods providedpredictions of a relatively small set of functional classes

ISRN Bioinformatics 3

[11 45 46] while others considered predictions extended tolarger sets using support vector machines and semidefiniteprogramming [11] artificial neural networks [47] functionallinkage networks [21 48] Bayesian networks [45] or meth-ods that combine functional linkage networks with learningmachines using a logistic regression model [12] or simplealgebraic operators [13]

Other research lines for AFP explicitly take into accountthe hierarchical nature of the multilabel classification prob-lem For instance structured outputmethods are based on thejoint kernelization of both input variables and output labelsusing for example perceptron-like learning algorithms [49]or maximum-margin algorithms [50] Other approachesimprove the prediction of GO annotations by extractingimplicit semantic relationships between genes and functions[51] Finally other methods adopted an ensemble approach[52] to take advantage of the intrinsic hierarchical natureof protein function prediction explicitly considering therelationships between functional classes [24 53ndash55]

Computational methods for AFP mostly based onmachine learning methods can be schematically grouped inthe following four families

(1) sequence-based methods(2) network-based methods(3) kernel methods for structured output spaces(4) hierarchical ensemble methods

This grouping is neither exhaustive nor strict meaning thatcertain methods do not belong to any of these groups andothers belong to more than one

21 Sequence-Based Methods Algorithms based on align-ment of sequences represent the first attempts to compu-tationally predict the function of proteins [56 57] similarsequences are likely to share common functions even if itis well known that secondary and tertiary structure con-servation are usually more strictly related to protein func-tions However algorithms able to infer similarities betweensequences are today standardmethods of assigning functionsto proteins in newly sequenced organisms [17 58] Of courseglobal or local structure comparison algorithms betweenproteins can be applied to detect functional properties [59]and in this context the integration of different sequenceand structure-based prediction methods represents a majorchallenge [60]

Even if most of the research efforts for the design anddevelopment of AFP methods concentrated on machinelearning methods it is worth noting that in the AFP 2011challenge [16] one of the best performing methods is repre-sented by a sequence-based algorithm [61] Indeed when theonly information available is represented by a raw sequence ofamino acids or nucleotides sequence-based methods can becompetitive with state-of-the-art machine learning methodsby exploiting homology-based inference [62]

22 Network-Based Methods These methods usually repre-sent each dataset through an undirected graph 119866 = (119881 119864)

where nodes V isin 119881 correspond to genegene products andedges 119890 isin 119864 are weighted according to the evidence ofcofunctionality implied by data source [63 64] These algo-rithms are able to transfer annotations from previouslyannotated (labeled) nodes to unannotated (unlabeled) onesby exploiting ldquoproximity relationshipsrdquo between connectednodes Basically these methods are based on transductivelabel propagation algorithms that predict the labels of unan-notated examples without using a global predictive model[14 21 45] Several method exploited the semantic similaritybetween GO terms [65 66] to derive functional similaritymeasures between genes to construct functional terms usingthen supervised or semisupervised learning algorithm toinfer GO annotations of genes [67ndash70]

Different strategies to learn the unlabeled nodes havebeen explored by ldquolabel propagationrdquo algorithms that ismethods able to ldquopropagaterdquo the labels of annotated proteinsacross the networks by exploiting the topology of the under-lying graph For instance methods based on the evaluationof the functional flow in graphs [64 71] methods based onthe Hopfield networks [48 72 73] methods based on theMarkov [74 75] andGaussian randomfields [14 46] and alsosimple ldquoguilt-by-associationrdquo methods [76 77] based on theassumption that connected nodesproteins in the functionalnetworks are likely to share the same functions Recentlymethods based on kernelized score functions able to exploitboth local and global semisupervised learning strategies havebeen successfully applied to AFP [78] as well as to diseasegene prioritization [79] and drug repositioning problems[80 81]

Reference [82] showed that different graph-based algo-rithms can be cast into a common framework where aquadratic cost objective function isminimized In this frame-work closed form solutions can be derived by solving a linearsystem of size equal to the cardinality of nodes (proteins) orusing fast iterative procedures such as the Jacobimethod [83]A network-based approach alternative to label propagationand exhibiting strong theoretical predictive guarantees in theso-called mistake bound model has been recently proposedby [84]

23 Kernel Methods for Structured Output Spaces By extend-ing kernels to the output space the multilabel hierarchicalclassification problem is solved globally the multilabels areviewed as elements of a structured space modeled by suitablekernel functions [85ndash87] and structured predictions areviewed as a maximum a posteriori prediction problem [88]

Given a feature space X and a space of structured labelsY the task is to learn a mapping 119891 X rarr Y by an inducedjoint kernel function 119896 that computes the ldquocompatibilityrdquo ofa given input-output pair (119909 119910) for each test example 119909 isin

X we need to determine the label 119910 isin Y such that 119910 =

argmax119910isinY119896(119909 119910) for any 119909 isin X By modeling probabilitiesby a log-linear model and using a suitable feature map120601(119909 119910) we can define an induced joint kernel function thatuses both inputs and outputs to compute the ldquocompatibilityrdquoof a given input-output pair [88]

119896 (X timesY) times (X timesY) 997888rarr R (1)

4 ISRN Bioinformatics

Structured output methods infer a label 119910 by finding themaximum of a function 119892 that uses the previously definedjoint kernel (1)

119910 = argmax119910isinY

119892 (119909 119910) (2)

The GOstruct system implemented a structured percep-tron and a variant of the structured support vector machine[85] This approach has been successfully applied to theprediction of GO terms inmouse and other model organisms[19] Structured output maximum-margin algorithms havebeen also applied to the tree-structured prediction of enzymefunctions [50 86]

24 Hierarchical Ensemble Methods Other approaches takeexplicitly into account the hierarchical relationships betweenfunctional terms [26 29 53 54 89 90] Usually they modifythe ldquoflatrdquo predictions (ie predictions made independentlyof the hierarchical structure of the classes) and correctthem improving accuracy and consistency of the multilabelannotations of proteins [24]

The flat approach makes predictions for each term inde-pendently and consequently the predictor may assign to asingle protein a set of terms that are inconsistent with oneanother A possible solution for this problem is to train aclassifier for each term of the reference ontology to producea set of prediction at each term and finally to reconcilethe predictions by taking into account the relationshipsbetween the classes of the ontology Different ensemblebased algorithms have been proposed ranging frommethodsrestricted to multilabels with single and no partial paths [91]to methods extended to multiple and also partial paths [92]Many recent published works clearly demonstrated that thisapproach ensures an increment in precision but this comesat expenses of the overall recall [2 30]

In the next section we discuss in detail hierarchicalensemble methods since they constitute the main topic ofthis review

3 Hierarchical Ensemble MethodsExploiting the Hierarchy toImprove Protein Function Prediction

Ensemble methods are one of the main research areas ofmachine learning [52 93ndash95] From a general standpointensembles of classifiers are sets of learning machines thatwork together to solve a classification problem (Figure 1)Empirical studies showed that in both classification andregression problems ensembles improve on single learningmachines and moreover large experimental studies com-pared the effectiveness of different ensemble methods onbenchmark data sets [96ndash99] and they have been successfullyapplied to several computational biology problems [100ndash104]Ensemble methods have been also successfully applied in anunsupervised setting [105 106] Several theories have beenproposed to explain the characteristics and the successfulapplication of ensembles to different application domainsFor instance Allwein Schapire and Singer interpreted the

Trainingdata

Combiner Final prediction

Base class 1

Base class L

Figure 1 Ensemble of classifiers

improved generalization capabilities of ensembles of learningmachines in the framework of large margin classifiers [107108] Kleinberg in the context of Stochastic DiscriminationTheory [109] and Breiman and Friedman in the light ofthe bias-variance analysis borrowed from classical statistics[110 111]The interest in this research area ismotivated also bythe availability of very fast computers and networks of work-stations at a relatively low cost that allow the implementationand the experimentation of complex ensemblemethods usingoff-the-shelf computer platforms

Constraints between labels and more in general theissue of label dependence have been recognized to play acentral role in multilabel learning [112] Protein functionprediction can be regarded as a paradigmatic multilabelclassification problem where the exploitation of a prioriknowledge about the hierarchical relationships between thelabels can dramatically improve classification performance[24 27 113]

In the context ofAFPproblems ensemblemethods reflectthe hierarchy of functional terms in the structure of theensemble itself each base learner is associated with a nodeof the graph representing the functional hierarchy and learnsa specific GO term or FunCat category The predictionsprovided by the trained classifiers are then combined byexploiting the hierarchical relationships of the taxonomy

In their more general form hierarchical ensemble meth-ods adopt a two-step learning strategy

(1) In the first step each base learner separately or inter-acting with connected base learners learns the proteinfunctional category on a per term basis Inmost casesthis yields a set of independent classification prob-lems where each base learning machine is trained tolearn a specific functional term independently of theother base learners

(2) In the second step the predictions provided by thetrained classifiers are combined by considering thehierarchical relationships between the base classifiersmodeled according to the hierarchy of the functionalclasses

Figure 2 depicts the two learning steps of hierarchicalensemble methods In the first step a learning algorithm(a square object in Figure 2(a)) is applied to train thebase classifiers associated with each class (represented withnumbers from 1 to 9) Then the resulting base classifiers

ISRN Bioinformatics 5

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Data

(a)

0

1 2

3 4 5 6 7

8 9

(b)

Figure 2 Schematic representation of the two main learning steps of hierarchical ensemble methods (a) Training of base classifiers (b)top-down andor bottom-up propagation of the predictions

(circles) in the prediction phase exploit the hierarchicalrelationships between classes to combine its predictions withthose provided by the other base classifiers (Figure 2(b))Note that the dummy 0 node is added to obtain a rootedhierarchy Up and down arrows represent the possibility ofcombining predictions by exploiting those provided respec-tively by children and parents classifiers according to abottom-up or top-down learning strategy Note that bothldquolocalrdquo combinations are possible (eg the prediction ofnode 5 may depend only on the prediction of node 1) butalso ldquoglobalrdquo combinations can be considered by taking intoaccount the predictions across the overall structure of thegraph (eg predictions for node 9 can depend on all thepredictions made by all the other base classifiers from 1 to8) Moreover both top-down propagation of the predictions(down arrows Figure 2(b)) and bottom-up propagation (uparrows) can be considered depending on the specific designof the hierarchical ensemble algorithm

This ensemble approach is highly modular in principleany learning algorithm can be used to train the classifiers inthe first step and both annotation decisions probabilitiesor whatever scores provided by each base learner can becombined depending on the characteristics of the specifichierarchical ensemble method

In this section we provide some basic notations andan ensemble taxonomy that will be used to introduce thedifferent hierarchical ensemble methods for AFP

31 Basic Notation A genegene product 119892 can be repre-sented through a vector x isin R119889 having 119889 different features(eg gene expression levels across 119889 different conditionssequence similarities with other genesproteins or presenceor absence of a given domain in the corresponding proteinor genetic or physical interaction with other proteins) Notethat we for the sake of simplicity and with a certain approxi-mation refer in the same way to genes and proteins even if itis well known that a given gene may correspond to multipleproteins A gene 119892 is assigned to one or more functionalclasses in the set 119862 = 1198881 1198882 119888119898 structured according toa FunCat forest of trees 119879 or a directed acyclic graph 119866 ofthe Gene Ontology (usually a dummy root class 1198880 which

every gene belongs to is added to 119879 or 119866 to facilitate theprocessing) The assignments are coded through a vector ofmultilabels y = (1199101 1199102 119910119898) isin 0 1

119898 where 119892 belongs toclass 119888119894 if and only if 119910119894 = 1

In both theGeneOntology(GO) and FunCat taxonomiesthe functional classes are structured according to a hierarchyand can be represented by a directed graph where nodescorrespond to classes and edges correspond to relationshipsbetween classes Hence the node corresponding to the class119888119894 can be simply denoted by 119894 We represent the set of childrennodes of 119894 by child(119894) and the set of its parents by par(119894)Moreover 119910child(119894) denotes the labels of the children classesof node 119894 and analogously 119910par(119894) denotes the labels of theparent classes of 119894 Note that in FunCat only one parent ispermitted since the overall hierarchy is a tree forest while inthe GO more parents are allowed because the relationshipsare structured according to a directed acyclic graph

Hierarchical ensemble methods train a set of calibratedclassifiers one for each node of the taxonomy 119879 These clas-sifiers are used to derive estimates 119901119894(119892) of the probabilities119901119894(119892) = P(119881119894 = 1 | 119881par(119894) = 1 119892) for all 119892 and 119894 where(1198811 119881119898) isin 0 1

119898 is the vector random variablemodelingthe unknown multilabel of a gene 119892 and 119881par(119894) denotes therandom variables associated with the parents of node 119894 Notethat 119901119894(119892) are probabilities conditioned to 119881par(119894) = 1 thatis the probability that a gene is annotated to a given term119894 given that the gene is just annotated to its parent termsthus respecting the true path rule Ensemble methods infera multilabel assignment y = (1199101 119910119898) isin 0 1

119898 based onestimates 1199011(119892) 119901119898(119892)

32 A Taxonomy of Hierarchical Ensemble Methods Hierar-chical ensemble methods for AFP share several character-istics from the two-step learning approach to the exploita-tion of the hierarchical relationships between classes Forthese reasons it is quite difficult to clearly and univocallyindividuate taxonomy of hierarchical ensemble methodsHere we show taxonomy useful mainly to describe anddiscuss existing methods for AFP For a recent review andtaxonomy of hierarchical ensemble methods not specific for

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 3

[11 45 46] while others considered predictions extended tolarger sets using support vector machines and semidefiniteprogramming [11] artificial neural networks [47] functionallinkage networks [21 48] Bayesian networks [45] or meth-ods that combine functional linkage networks with learningmachines using a logistic regression model [12] or simplealgebraic operators [13]

Other research lines for AFP explicitly take into accountthe hierarchical nature of the multilabel classification prob-lem For instance structured outputmethods are based on thejoint kernelization of both input variables and output labelsusing for example perceptron-like learning algorithms [49]or maximum-margin algorithms [50] Other approachesimprove the prediction of GO annotations by extractingimplicit semantic relationships between genes and functions[51] Finally other methods adopted an ensemble approach[52] to take advantage of the intrinsic hierarchical natureof protein function prediction explicitly considering therelationships between functional classes [24 53ndash55]

Computational methods for AFP mostly based onmachine learning methods can be schematically grouped inthe following four families

(1) sequence-based methods(2) network-based methods(3) kernel methods for structured output spaces(4) hierarchical ensemble methods

This grouping is neither exhaustive nor strict meaning thatcertain methods do not belong to any of these groups andothers belong to more than one

21 Sequence-Based Methods Algorithms based on align-ment of sequences represent the first attempts to compu-tationally predict the function of proteins [56 57] similarsequences are likely to share common functions even if itis well known that secondary and tertiary structure con-servation are usually more strictly related to protein func-tions However algorithms able to infer similarities betweensequences are today standardmethods of assigning functionsto proteins in newly sequenced organisms [17 58] Of courseglobal or local structure comparison algorithms betweenproteins can be applied to detect functional properties [59]and in this context the integration of different sequenceand structure-based prediction methods represents a majorchallenge [60]

Even if most of the research efforts for the design anddevelopment of AFP methods concentrated on machinelearning methods it is worth noting that in the AFP 2011challenge [16] one of the best performing methods is repre-sented by a sequence-based algorithm [61] Indeed when theonly information available is represented by a raw sequence ofamino acids or nucleotides sequence-based methods can becompetitive with state-of-the-art machine learning methodsby exploiting homology-based inference [62]

22 Network-Based Methods These methods usually repre-sent each dataset through an undirected graph 119866 = (119881 119864)

where nodes V isin 119881 correspond to genegene products andedges 119890 isin 119864 are weighted according to the evidence ofcofunctionality implied by data source [63 64] These algo-rithms are able to transfer annotations from previouslyannotated (labeled) nodes to unannotated (unlabeled) onesby exploiting ldquoproximity relationshipsrdquo between connectednodes Basically these methods are based on transductivelabel propagation algorithms that predict the labels of unan-notated examples without using a global predictive model[14 21 45] Several method exploited the semantic similaritybetween GO terms [65 66] to derive functional similaritymeasures between genes to construct functional terms usingthen supervised or semisupervised learning algorithm toinfer GO annotations of genes [67ndash70]

Different strategies to learn the unlabeled nodes havebeen explored by ldquolabel propagationrdquo algorithms that ismethods able to ldquopropagaterdquo the labels of annotated proteinsacross the networks by exploiting the topology of the under-lying graph For instance methods based on the evaluationof the functional flow in graphs [64 71] methods based onthe Hopfield networks [48 72 73] methods based on theMarkov [74 75] andGaussian randomfields [14 46] and alsosimple ldquoguilt-by-associationrdquo methods [76 77] based on theassumption that connected nodesproteins in the functionalnetworks are likely to share the same functions Recentlymethods based on kernelized score functions able to exploitboth local and global semisupervised learning strategies havebeen successfully applied to AFP [78] as well as to diseasegene prioritization [79] and drug repositioning problems[80 81]

Reference [82] showed that different graph-based algo-rithms can be cast into a common framework where aquadratic cost objective function isminimized In this frame-work closed form solutions can be derived by solving a linearsystem of size equal to the cardinality of nodes (proteins) orusing fast iterative procedures such as the Jacobimethod [83]A network-based approach alternative to label propagationand exhibiting strong theoretical predictive guarantees in theso-called mistake bound model has been recently proposedby [84]

23 Kernel Methods for Structured Output Spaces By extend-ing kernels to the output space the multilabel hierarchicalclassification problem is solved globally the multilabels areviewed as elements of a structured space modeled by suitablekernel functions [85ndash87] and structured predictions areviewed as a maximum a posteriori prediction problem [88]

Given a feature space X and a space of structured labelsY the task is to learn a mapping 119891 X rarr Y by an inducedjoint kernel function 119896 that computes the ldquocompatibilityrdquo ofa given input-output pair (119909 119910) for each test example 119909 isin

X we need to determine the label 119910 isin Y such that 119910 =

argmax119910isinY119896(119909 119910) for any 119909 isin X By modeling probabilitiesby a log-linear model and using a suitable feature map120601(119909 119910) we can define an induced joint kernel function thatuses both inputs and outputs to compute the ldquocompatibilityrdquoof a given input-output pair [88]

119896 (X timesY) times (X timesY) 997888rarr R (1)

4 ISRN Bioinformatics

Structured output methods infer a label 119910 by finding themaximum of a function 119892 that uses the previously definedjoint kernel (1)

119910 = argmax119910isinY

119892 (119909 119910) (2)

The GOstruct system implemented a structured percep-tron and a variant of the structured support vector machine[85] This approach has been successfully applied to theprediction of GO terms inmouse and other model organisms[19] Structured output maximum-margin algorithms havebeen also applied to the tree-structured prediction of enzymefunctions [50 86]

24 Hierarchical Ensemble Methods Other approaches takeexplicitly into account the hierarchical relationships betweenfunctional terms [26 29 53 54 89 90] Usually they modifythe ldquoflatrdquo predictions (ie predictions made independentlyof the hierarchical structure of the classes) and correctthem improving accuracy and consistency of the multilabelannotations of proteins [24]

The flat approach makes predictions for each term inde-pendently and consequently the predictor may assign to asingle protein a set of terms that are inconsistent with oneanother A possible solution for this problem is to train aclassifier for each term of the reference ontology to producea set of prediction at each term and finally to reconcilethe predictions by taking into account the relationshipsbetween the classes of the ontology Different ensemblebased algorithms have been proposed ranging frommethodsrestricted to multilabels with single and no partial paths [91]to methods extended to multiple and also partial paths [92]Many recent published works clearly demonstrated that thisapproach ensures an increment in precision but this comesat expenses of the overall recall [2 30]

In the next section we discuss in detail hierarchicalensemble methods since they constitute the main topic ofthis review

3 Hierarchical Ensemble MethodsExploiting the Hierarchy toImprove Protein Function Prediction

Ensemble methods are one of the main research areas ofmachine learning [52 93ndash95] From a general standpointensembles of classifiers are sets of learning machines thatwork together to solve a classification problem (Figure 1)Empirical studies showed that in both classification andregression problems ensembles improve on single learningmachines and moreover large experimental studies com-pared the effectiveness of different ensemble methods onbenchmark data sets [96ndash99] and they have been successfullyapplied to several computational biology problems [100ndash104]Ensemble methods have been also successfully applied in anunsupervised setting [105 106] Several theories have beenproposed to explain the characteristics and the successfulapplication of ensembles to different application domainsFor instance Allwein Schapire and Singer interpreted the

Trainingdata

Combiner Final prediction

Base class 1

Base class L

Figure 1 Ensemble of classifiers

improved generalization capabilities of ensembles of learningmachines in the framework of large margin classifiers [107108] Kleinberg in the context of Stochastic DiscriminationTheory [109] and Breiman and Friedman in the light ofthe bias-variance analysis borrowed from classical statistics[110 111]The interest in this research area ismotivated also bythe availability of very fast computers and networks of work-stations at a relatively low cost that allow the implementationand the experimentation of complex ensemblemethods usingoff-the-shelf computer platforms

Constraints between labels and more in general theissue of label dependence have been recognized to play acentral role in multilabel learning [112] Protein functionprediction can be regarded as a paradigmatic multilabelclassification problem where the exploitation of a prioriknowledge about the hierarchical relationships between thelabels can dramatically improve classification performance[24 27 113]

In the context ofAFPproblems ensemblemethods reflectthe hierarchy of functional terms in the structure of theensemble itself each base learner is associated with a nodeof the graph representing the functional hierarchy and learnsa specific GO term or FunCat category The predictionsprovided by the trained classifiers are then combined byexploiting the hierarchical relationships of the taxonomy

In their more general form hierarchical ensemble meth-ods adopt a two-step learning strategy

(1) In the first step each base learner separately or inter-acting with connected base learners learns the proteinfunctional category on a per term basis Inmost casesthis yields a set of independent classification prob-lems where each base learning machine is trained tolearn a specific functional term independently of theother base learners

(2) In the second step the predictions provided by thetrained classifiers are combined by considering thehierarchical relationships between the base classifiersmodeled according to the hierarchy of the functionalclasses

Figure 2 depicts the two learning steps of hierarchicalensemble methods In the first step a learning algorithm(a square object in Figure 2(a)) is applied to train thebase classifiers associated with each class (represented withnumbers from 1 to 9) Then the resulting base classifiers

ISRN Bioinformatics 5

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Data

(a)

0

1 2

3 4 5 6 7

8 9

(b)

Figure 2 Schematic representation of the two main learning steps of hierarchical ensemble methods (a) Training of base classifiers (b)top-down andor bottom-up propagation of the predictions

(circles) in the prediction phase exploit the hierarchicalrelationships between classes to combine its predictions withthose provided by the other base classifiers (Figure 2(b))Note that the dummy 0 node is added to obtain a rootedhierarchy Up and down arrows represent the possibility ofcombining predictions by exploiting those provided respec-tively by children and parents classifiers according to abottom-up or top-down learning strategy Note that bothldquolocalrdquo combinations are possible (eg the prediction ofnode 5 may depend only on the prediction of node 1) butalso ldquoglobalrdquo combinations can be considered by taking intoaccount the predictions across the overall structure of thegraph (eg predictions for node 9 can depend on all thepredictions made by all the other base classifiers from 1 to8) Moreover both top-down propagation of the predictions(down arrows Figure 2(b)) and bottom-up propagation (uparrows) can be considered depending on the specific designof the hierarchical ensemble algorithm

This ensemble approach is highly modular in principleany learning algorithm can be used to train the classifiers inthe first step and both annotation decisions probabilitiesor whatever scores provided by each base learner can becombined depending on the characteristics of the specifichierarchical ensemble method

In this section we provide some basic notations andan ensemble taxonomy that will be used to introduce thedifferent hierarchical ensemble methods for AFP

31 Basic Notation A genegene product 119892 can be repre-sented through a vector x isin R119889 having 119889 different features(eg gene expression levels across 119889 different conditionssequence similarities with other genesproteins or presenceor absence of a given domain in the corresponding proteinor genetic or physical interaction with other proteins) Notethat we for the sake of simplicity and with a certain approxi-mation refer in the same way to genes and proteins even if itis well known that a given gene may correspond to multipleproteins A gene 119892 is assigned to one or more functionalclasses in the set 119862 = 1198881 1198882 119888119898 structured according toa FunCat forest of trees 119879 or a directed acyclic graph 119866 ofthe Gene Ontology (usually a dummy root class 1198880 which

every gene belongs to is added to 119879 or 119866 to facilitate theprocessing) The assignments are coded through a vector ofmultilabels y = (1199101 1199102 119910119898) isin 0 1

119898 where 119892 belongs toclass 119888119894 if and only if 119910119894 = 1

In both theGeneOntology(GO) and FunCat taxonomiesthe functional classes are structured according to a hierarchyand can be represented by a directed graph where nodescorrespond to classes and edges correspond to relationshipsbetween classes Hence the node corresponding to the class119888119894 can be simply denoted by 119894 We represent the set of childrennodes of 119894 by child(119894) and the set of its parents by par(119894)Moreover 119910child(119894) denotes the labels of the children classesof node 119894 and analogously 119910par(119894) denotes the labels of theparent classes of 119894 Note that in FunCat only one parent ispermitted since the overall hierarchy is a tree forest while inthe GO more parents are allowed because the relationshipsare structured according to a directed acyclic graph

Hierarchical ensemble methods train a set of calibratedclassifiers one for each node of the taxonomy 119879 These clas-sifiers are used to derive estimates 119901119894(119892) of the probabilities119901119894(119892) = P(119881119894 = 1 | 119881par(119894) = 1 119892) for all 119892 and 119894 where(1198811 119881119898) isin 0 1

119898 is the vector random variablemodelingthe unknown multilabel of a gene 119892 and 119881par(119894) denotes therandom variables associated with the parents of node 119894 Notethat 119901119894(119892) are probabilities conditioned to 119881par(119894) = 1 thatis the probability that a gene is annotated to a given term119894 given that the gene is just annotated to its parent termsthus respecting the true path rule Ensemble methods infera multilabel assignment y = (1199101 119910119898) isin 0 1

119898 based onestimates 1199011(119892) 119901119898(119892)

32 A Taxonomy of Hierarchical Ensemble Methods Hierar-chical ensemble methods for AFP share several character-istics from the two-step learning approach to the exploita-tion of the hierarchical relationships between classes Forthese reasons it is quite difficult to clearly and univocallyindividuate taxonomy of hierarchical ensemble methodsHere we show taxonomy useful mainly to describe anddiscuss existing methods for AFP For a recent review andtaxonomy of hierarchical ensemble methods not specific for

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

4 ISRN Bioinformatics

Structured output methods infer a label 119910 by finding themaximum of a function 119892 that uses the previously definedjoint kernel (1)

119910 = argmax119910isinY

119892 (119909 119910) (2)

The GOstruct system implemented a structured percep-tron and a variant of the structured support vector machine[85] This approach has been successfully applied to theprediction of GO terms inmouse and other model organisms[19] Structured output maximum-margin algorithms havebeen also applied to the tree-structured prediction of enzymefunctions [50 86]

24 Hierarchical Ensemble Methods Other approaches takeexplicitly into account the hierarchical relationships betweenfunctional terms [26 29 53 54 89 90] Usually they modifythe ldquoflatrdquo predictions (ie predictions made independentlyof the hierarchical structure of the classes) and correctthem improving accuracy and consistency of the multilabelannotations of proteins [24]

The flat approach makes predictions for each term inde-pendently and consequently the predictor may assign to asingle protein a set of terms that are inconsistent with oneanother A possible solution for this problem is to train aclassifier for each term of the reference ontology to producea set of prediction at each term and finally to reconcilethe predictions by taking into account the relationshipsbetween the classes of the ontology Different ensemblebased algorithms have been proposed ranging frommethodsrestricted to multilabels with single and no partial paths [91]to methods extended to multiple and also partial paths [92]Many recent published works clearly demonstrated that thisapproach ensures an increment in precision but this comesat expenses of the overall recall [2 30]

In the next section we discuss in detail hierarchicalensemble methods since they constitute the main topic ofthis review

3 Hierarchical Ensemble MethodsExploiting the Hierarchy toImprove Protein Function Prediction

Ensemble methods are one of the main research areas ofmachine learning [52 93ndash95] From a general standpointensembles of classifiers are sets of learning machines thatwork together to solve a classification problem (Figure 1)Empirical studies showed that in both classification andregression problems ensembles improve on single learningmachines and moreover large experimental studies com-pared the effectiveness of different ensemble methods onbenchmark data sets [96ndash99] and they have been successfullyapplied to several computational biology problems [100ndash104]Ensemble methods have been also successfully applied in anunsupervised setting [105 106] Several theories have beenproposed to explain the characteristics and the successfulapplication of ensembles to different application domainsFor instance Allwein Schapire and Singer interpreted the

Trainingdata

Combiner Final prediction

Base class 1

Base class L

Figure 1 Ensemble of classifiers

improved generalization capabilities of ensembles of learningmachines in the framework of large margin classifiers [107108] Kleinberg in the context of Stochastic DiscriminationTheory [109] and Breiman and Friedman in the light ofthe bias-variance analysis borrowed from classical statistics[110 111]The interest in this research area ismotivated also bythe availability of very fast computers and networks of work-stations at a relatively low cost that allow the implementationand the experimentation of complex ensemblemethods usingoff-the-shelf computer platforms

Constraints between labels and more in general theissue of label dependence have been recognized to play acentral role in multilabel learning [112] Protein functionprediction can be regarded as a paradigmatic multilabelclassification problem where the exploitation of a prioriknowledge about the hierarchical relationships between thelabels can dramatically improve classification performance[24 27 113]

In the context ofAFPproblems ensemblemethods reflectthe hierarchy of functional terms in the structure of theensemble itself each base learner is associated with a nodeof the graph representing the functional hierarchy and learnsa specific GO term or FunCat category The predictionsprovided by the trained classifiers are then combined byexploiting the hierarchical relationships of the taxonomy

In their more general form hierarchical ensemble meth-ods adopt a two-step learning strategy

(1) In the first step each base learner separately or inter-acting with connected base learners learns the proteinfunctional category on a per term basis Inmost casesthis yields a set of independent classification prob-lems where each base learning machine is trained tolearn a specific functional term independently of theother base learners

(2) In the second step the predictions provided by thetrained classifiers are combined by considering thehierarchical relationships between the base classifiersmodeled according to the hierarchy of the functionalclasses

Figure 2 depicts the two learning steps of hierarchicalensemble methods In the first step a learning algorithm(a square object in Figure 2(a)) is applied to train thebase classifiers associated with each class (represented withnumbers from 1 to 9) Then the resulting base classifiers

ISRN Bioinformatics 5

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Data

(a)

0

1 2

3 4 5 6 7

8 9

(b)

Figure 2 Schematic representation of the two main learning steps of hierarchical ensemble methods (a) Training of base classifiers (b)top-down andor bottom-up propagation of the predictions

(circles) in the prediction phase exploit the hierarchicalrelationships between classes to combine its predictions withthose provided by the other base classifiers (Figure 2(b))Note that the dummy 0 node is added to obtain a rootedhierarchy Up and down arrows represent the possibility ofcombining predictions by exploiting those provided respec-tively by children and parents classifiers according to abottom-up or top-down learning strategy Note that bothldquolocalrdquo combinations are possible (eg the prediction ofnode 5 may depend only on the prediction of node 1) butalso ldquoglobalrdquo combinations can be considered by taking intoaccount the predictions across the overall structure of thegraph (eg predictions for node 9 can depend on all thepredictions made by all the other base classifiers from 1 to8) Moreover both top-down propagation of the predictions(down arrows Figure 2(b)) and bottom-up propagation (uparrows) can be considered depending on the specific designof the hierarchical ensemble algorithm

This ensemble approach is highly modular in principleany learning algorithm can be used to train the classifiers inthe first step and both annotation decisions probabilitiesor whatever scores provided by each base learner can becombined depending on the characteristics of the specifichierarchical ensemble method

In this section we provide some basic notations andan ensemble taxonomy that will be used to introduce thedifferent hierarchical ensemble methods for AFP

31 Basic Notation A genegene product 119892 can be repre-sented through a vector x isin R119889 having 119889 different features(eg gene expression levels across 119889 different conditionssequence similarities with other genesproteins or presenceor absence of a given domain in the corresponding proteinor genetic or physical interaction with other proteins) Notethat we for the sake of simplicity and with a certain approxi-mation refer in the same way to genes and proteins even if itis well known that a given gene may correspond to multipleproteins A gene 119892 is assigned to one or more functionalclasses in the set 119862 = 1198881 1198882 119888119898 structured according toa FunCat forest of trees 119879 or a directed acyclic graph 119866 ofthe Gene Ontology (usually a dummy root class 1198880 which

every gene belongs to is added to 119879 or 119866 to facilitate theprocessing) The assignments are coded through a vector ofmultilabels y = (1199101 1199102 119910119898) isin 0 1

119898 where 119892 belongs toclass 119888119894 if and only if 119910119894 = 1

In both theGeneOntology(GO) and FunCat taxonomiesthe functional classes are structured according to a hierarchyand can be represented by a directed graph where nodescorrespond to classes and edges correspond to relationshipsbetween classes Hence the node corresponding to the class119888119894 can be simply denoted by 119894 We represent the set of childrennodes of 119894 by child(119894) and the set of its parents by par(119894)Moreover 119910child(119894) denotes the labels of the children classesof node 119894 and analogously 119910par(119894) denotes the labels of theparent classes of 119894 Note that in FunCat only one parent ispermitted since the overall hierarchy is a tree forest while inthe GO more parents are allowed because the relationshipsare structured according to a directed acyclic graph

Hierarchical ensemble methods train a set of calibratedclassifiers one for each node of the taxonomy 119879 These clas-sifiers are used to derive estimates 119901119894(119892) of the probabilities119901119894(119892) = P(119881119894 = 1 | 119881par(119894) = 1 119892) for all 119892 and 119894 where(1198811 119881119898) isin 0 1

119898 is the vector random variablemodelingthe unknown multilabel of a gene 119892 and 119881par(119894) denotes therandom variables associated with the parents of node 119894 Notethat 119901119894(119892) are probabilities conditioned to 119881par(119894) = 1 thatis the probability that a gene is annotated to a given term119894 given that the gene is just annotated to its parent termsthus respecting the true path rule Ensemble methods infera multilabel assignment y = (1199101 119910119898) isin 0 1

119898 based onestimates 1199011(119892) 119901119898(119892)

32 A Taxonomy of Hierarchical Ensemble Methods Hierar-chical ensemble methods for AFP share several character-istics from the two-step learning approach to the exploita-tion of the hierarchical relationships between classes Forthese reasons it is quite difficult to clearly and univocallyindividuate taxonomy of hierarchical ensemble methodsHere we show taxonomy useful mainly to describe anddiscuss existing methods for AFP For a recent review andtaxonomy of hierarchical ensemble methods not specific for

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 5

1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9

Data

(a)

0

1 2

3 4 5 6 7

8 9

(b)

Figure 2 Schematic representation of the two main learning steps of hierarchical ensemble methods (a) Training of base classifiers (b)top-down andor bottom-up propagation of the predictions

(circles) in the prediction phase exploit the hierarchicalrelationships between classes to combine its predictions withthose provided by the other base classifiers (Figure 2(b))Note that the dummy 0 node is added to obtain a rootedhierarchy Up and down arrows represent the possibility ofcombining predictions by exploiting those provided respec-tively by children and parents classifiers according to abottom-up or top-down learning strategy Note that bothldquolocalrdquo combinations are possible (eg the prediction ofnode 5 may depend only on the prediction of node 1) butalso ldquoglobalrdquo combinations can be considered by taking intoaccount the predictions across the overall structure of thegraph (eg predictions for node 9 can depend on all thepredictions made by all the other base classifiers from 1 to8) Moreover both top-down propagation of the predictions(down arrows Figure 2(b)) and bottom-up propagation (uparrows) can be considered depending on the specific designof the hierarchical ensemble algorithm

This ensemble approach is highly modular in principleany learning algorithm can be used to train the classifiers inthe first step and both annotation decisions probabilitiesor whatever scores provided by each base learner can becombined depending on the characteristics of the specifichierarchical ensemble method

In this section we provide some basic notations andan ensemble taxonomy that will be used to introduce thedifferent hierarchical ensemble methods for AFP

31 Basic Notation A genegene product 119892 can be repre-sented through a vector x isin R119889 having 119889 different features(eg gene expression levels across 119889 different conditionssequence similarities with other genesproteins or presenceor absence of a given domain in the corresponding proteinor genetic or physical interaction with other proteins) Notethat we for the sake of simplicity and with a certain approxi-mation refer in the same way to genes and proteins even if itis well known that a given gene may correspond to multipleproteins A gene 119892 is assigned to one or more functionalclasses in the set 119862 = 1198881 1198882 119888119898 structured according toa FunCat forest of trees 119879 or a directed acyclic graph 119866 ofthe Gene Ontology (usually a dummy root class 1198880 which

every gene belongs to is added to 119879 or 119866 to facilitate theprocessing) The assignments are coded through a vector ofmultilabels y = (1199101 1199102 119910119898) isin 0 1

119898 where 119892 belongs toclass 119888119894 if and only if 119910119894 = 1

In both theGeneOntology(GO) and FunCat taxonomiesthe functional classes are structured according to a hierarchyand can be represented by a directed graph where nodescorrespond to classes and edges correspond to relationshipsbetween classes Hence the node corresponding to the class119888119894 can be simply denoted by 119894 We represent the set of childrennodes of 119894 by child(119894) and the set of its parents by par(119894)Moreover 119910child(119894) denotes the labels of the children classesof node 119894 and analogously 119910par(119894) denotes the labels of theparent classes of 119894 Note that in FunCat only one parent ispermitted since the overall hierarchy is a tree forest while inthe GO more parents are allowed because the relationshipsare structured according to a directed acyclic graph

Hierarchical ensemble methods train a set of calibratedclassifiers one for each node of the taxonomy 119879 These clas-sifiers are used to derive estimates 119901119894(119892) of the probabilities119901119894(119892) = P(119881119894 = 1 | 119881par(119894) = 1 119892) for all 119892 and 119894 where(1198811 119881119898) isin 0 1

119898 is the vector random variablemodelingthe unknown multilabel of a gene 119892 and 119881par(119894) denotes therandom variables associated with the parents of node 119894 Notethat 119901119894(119892) are probabilities conditioned to 119881par(119894) = 1 thatis the probability that a gene is annotated to a given term119894 given that the gene is just annotated to its parent termsthus respecting the true path rule Ensemble methods infera multilabel assignment y = (1199101 119910119898) isin 0 1

119898 based onestimates 1199011(119892) 119901119898(119892)

32 A Taxonomy of Hierarchical Ensemble Methods Hierar-chical ensemble methods for AFP share several character-istics from the two-step learning approach to the exploita-tion of the hierarchical relationships between classes Forthese reasons it is quite difficult to clearly and univocallyindividuate taxonomy of hierarchical ensemble methodsHere we show taxonomy useful mainly to describe anddiscuss existing methods for AFP For a recent review andtaxonomy of hierarchical ensemble methods not specific for

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

6 ISRN Bioinformatics

AFP problems we refer the reader to the comprehensive Sillaand othersrsquo review [44]

In the following sections we discuss the following groupsof hierarchical ensemble methods

(i) top-down ensemble methods These methods arecharacterized by a simple top-down approach in thesecond step only the output of the parent nodebaseclassifier influences the output of the children thusresulting in a top-down propagation of the decisions

(ii) Bayesian ensemble methods These are a class ofmethods theoretically well founded and in some casesthey are optimal from a Bayesian standpoint

(iii) Reconciliation methodsThis is a heterogeneous classof heuristics bywhichwe can combine the predictionsof the base learners by adopting different ldquolocalrdquo orldquoglobalrdquo combination strategies

(iv) true path rule ensembles These methods adopt aheuristic approach based on the ldquotrue path rulerdquo thatgoverns both the GO and FunCat ontologies

(v) decision tree-based ensembles These methods arecharacterized by the application of decision treesas base learners or by adopting decision tree-likelearning strategies to combine predictions of the baselearners

Despite this general characterization several methodscould be assigned to different groups and for several hierar-chical ensemble methods it is difficult to assign them to anythe introduced classes of methods

For instance in [114ndash116] the authors used the hierarchyonly to construct training sets different for each term ofthe Gene Ontology by determining positive and negativeexamples on the basis of the relationships between functionalterms In [89] for each classifier associatedwith a node a geneis labeled as positive (ie belonging to the term associatedwith that node) if it actually belongs to that node or asnegative if it does not belong to that node or to the ancestorsor descendants of the node

Other approaches exploited the correlation betweennearby classes [32 53 117] Shahbaba and Neal [53] take intoaccount the hierarchy to introduce correlation between func-tional classes using a multinomial logit model with Bayesianpriors in the context of E coli functional classificationwith Rileyrsquos hierarchies [118] Bogdanov and Singh incorpo-rated functional interrelationships between terms during theextraction of features based on annotations of neighboringgenes and then applied a nearest-neighbor classifier to predictprotein functions [117] The HiBLADE method (hierarchicalmultilabel boosting with label dependency) [32] not onlytakes advantage of the preestablished hierarchical taxonomyof the classes but also effectively exploits the hidden corre-lation among the classes that is not shown through the classhierarchy thereby improving the quality of the predictionsIn particular the dependencies of the children for eachlabel in the hierarchy are captured and analyzed using theBayes method and instance-based similarity Experimentsusing the FunCat taxonomy and the yeast model organism

show that the proposed method is competitive with TPR-W (Section 72) and HABYES-CS (Section 53) hierarchicalensemble methods

An adaptation of a classical multiclass boosting algorithm[119] has been adapted to fit the hierarchical structure of theFunCat taxonomy [120] the method is relatively simple andstraightforward to be implemented and achieves competitiveresults for the AFP in the yeast model organism

Finally other hierarchical approaches have been pro-posed in the context of competitive networks learning frame-work Competitive networks are well-known unsupervisedand supervised methods able to map the input space into astructured output space where clusters or classes are usuallyarranged according to a grid topology and where learningadopts at the same way a competition cooperation andadaptation strategy [121] Interestingly enough in [122] theauthors adopted this approach to predict the hierarchy ofgene annotations in the yeast model organism by usinga tree-topology according to the FunCat taxonomy eachneuron is connected with its parent or with its childrenMoreover each neuron in tree-structured output layer isconnected to all neurons of the input layer representing theinstances that is the set of genomic features associated witheach gene to be classified Results obtained with the hierarchyof enzyme commission codes showed that this approach iscompetitive with those obtained with hierarchical decisiontrees ensembles [29] (Section 8)

To provide a general picture of the methods discussedin the following sections Table 1 summarizes their maincharacteristics The first two columns report the name and areference to the method the third whether multiple or singlepaths across the taxonomy are allowed and the next whetherpartial paths are considered (ie paths that do not end witha leaf) The successive columns refer to the class structure(a tree or a DAG) to the adoption or not of cost-sensitive(ie unbalance-aware) classification approaches and to theadoption of strategies to properly select negative examples inthe training phase Finally the last three columns summarizethe type of the base learner used (ldquospecrdquo means that onlya specific type of base learner is allowed and ldquoanyrdquo meansthat any type of learner can be used within the method)whether the method improves or not with respect to the flatapproach and the mode of processing of the nodes (ldquoTDrdquotop-down approach and ldquoTDampBUPrdquo adopting both top-down and bottom-up strategies) Of course methods havingmore checkmarks are more flexible and in general methodsthat can process a DAG can also process tree-structuredontologies but the opposite is not guaranteed while thetype of node processing relies on the way the information ispropagated across the ontology It is worth noting that all theconsidered methods improve on baseline ldquoflatrdquo classificationmethods

4 Hierarchical Top-Down (HTD) Ensembles

These ensemble methods exploit the hierarchical relation-ships between functional terms in a top-to-bottom fashionthat is considering only the relationships denoted by thedown arrows in Figure 2(b) The basic hierarchical top-down

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 7

Table 1 Characteristics of some of the main hierarchical ensemble methods for AFP

Methods References Multipath Partial path Class structure Cost sens Sel neg Base learn Improves on flat Node processHMC-LMLP [124 125] radic radic TREE any radic TDHTD-CS [27] radic radic TREE radic any radic TDHTD-MULTI [127] radic TREE any radic TDHTD-PERLEV [128] TREE spec radic TDHTD-NET [26] radic radic DAG any radic TDBAYES NET-ENS [20] radic radic DAG radic spec radic TD amp BUPHIER-MB and BFS [30] radic radic DAG any radic TD amp BUPHBAYES [92 135] radic radic TREE radic any radic TD amp BUPHBAYES-CS [123] radic radic TREE radic radic any radic TD amp BUPReconc-heuristic [24] radic radic DAG any radic TDCascaded log [24] radic radic DAG any radic TDProjection-based [24] radic radic DAG any radic TD amp BUPTPR [31 142] radic radic TREE radic any radic TD amp BUPTPR-W [31] radic radic TREE radic radic any radic TD amp BUPTPR-W weighted [145] radic radic TREE radic any radic TD amp BUPDecision-tree-ens [29] radic radic DAG spec radic TD amp BUP

ensemble method (HTD) algorithm is straightforward foreach gene 119892 starting from the set of nodes at the first levelof the graph 119866 (denoted by root(119866)) the classifier associatedwith the node 119894 isin 119866 computeswhether the gene belongs to theclass 119888119894 If yes the classification process continues recursivelyon the nodes 119895 isin child(119894) otherwise it stops at node 119894 andthe nodes belonging to the descendants rooted at 119894 are all setto 0 To introduce the method we use probabilistic classifiersas base learners trained to predict class 119888119894 associated with thenode 119894 of the hierarchical taxonomy Their estimates 119901119894(119892) ofP(119881119894 = 1 | 119881par(119894) = 1 119892) are used by the HTD ensemble toclassify a gene 119892 as follows

119910119894 =

119901119894 (119892) gt1

2 if 119894 isin root (119866)

119901119894 (119892) gt1

2 if 119894 notin root (119866) and 119901par(119894) (119892) gt

1

2

0 if 119894 notin root (119866) and 119901par(119894) (119892) le1

2

(3)

where 119909 = 1 if 119909 gt 0 otherwise 119909 = 0 and119901par(119894) is the probability predicted for the parent of the term119894 It is easy to see that this procedure ensures that thepredicted multilabels y = (1199101 119910119898) are consistent withthe hierarchy We can apply the same top-down procedurealso using nonprobabilistic classifiers that is base learnersgenerating continuous scores or also discrete decisions byslightly modifying (3)

In [123] a cost-sensitive version of the basic top-downhierarchical ensemble method HTD has been proposed byassigning 119910119894 before the label of any 119895 in the subtree rooted at119894 the following rule is used

119910119894 = 119901119894 ge1

2 times 119910par(119894) = 1 (4)

for 119894 = 1 119898 (note that the guessed label 1199100 of the rootof 119866 is always 1) Then the cost-sensitive variant HTD-CS

introduces a single cost-sensitive parameter 120591 gt 0 whichreplaces the threshold 12 The resulting rule for HTD-CS isthen

119910119894 = 119901119894 ge 120591 times 119910par(119894) = 1 (5)

By tuning 120591 we may obtain ensembles with different pre-cisionrecall characteristics Despite the simplicity of thehierarchical top-down methods several works showed theireffectiveness for AFP problems [28 31]

For instance Cerri andDeCarvalho experimented differ-ent variants of top-down hierarchical ensemble methods forAFP [28 124 125] The HMC-LMLP (hierarchical multilabelclassification with local multilayer perceptron) successivelytrains a local MLP network for each hierarchical level usingthe classical backpropagation algorithm [126] Then theoutput of the MLP for the first layer is used as input totrain the MLP that learns the classes of the second leveland so on (Figure 3) A gene is annotated to a class if itscorresponding output in the MLP is larger than a predefinedthreshold then in a postprocessing phase (second-step ofthe hierarchical classification) inconsistent predictions areremoved (ie classes predictedwithout the prediction of theirsuperclasses) [125] In practice instead of using a dichotomicclassifier for each node the HMC-LMLP algorithm applies asingle multiclass multilayer perceptron for each level of thehierarchy

A related approach adopts multiclass classifiers (HTD-MULTI) for each node instead of a simple binary classifierand tries to find the most likely path from the root to theleaves of the hierarchy considering simple techniques suchas themultiplication or the sum of the probabilities estimatedat each node along the path [127] The method has beenapplied to the cell cycle branch of the FunCat hierarchywith the yeast model organism showing improvements withrespect to classical hierarchical top-down methods even ifthe proposed approach can only predict classes along a single

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

8 ISRN Bioinformatics

Input instance xLayers corresponding to

Layers corresponding tothe first hierarchical level

the second hierarchical level

Hidden neuronsHidden neurons

Outputs of the first level (classes)are provided as inputs to thenetwork of the second level

Outputs of thesecond level (classes)

x1

x2

x3

x4

xN

Figure 3HMC-LMLP outputs of theMLP responsible for the predictions in the first level are used as input to anotherMLP for the predictionsin the second level (adapted from [125])

ldquomost likely pathrdquo thus not considering that in AFP we mayhave annotations involving multiple and partial paths

Another method that introduces multiclass classifiersinstead of simple dichotomic classifiers has been proposed byPaes et al [128] local per level multiclass classifiers (HTD-PERLEV) are trained to distinguish between the classes ofa specific level of the hierarchy and two different strategiesto remove inconsistencies are introduced The method hasbeen applied to the hierarchical classification of enzymesusing the EC taxonomy for the hierarchical classificationof enzymes but unfortunately this algorithm is not wellsuited to AFP since leaf nodes are mandatory (that is partialpath annotations are not allowed) andmultilabel annotationsalong multiple paths are not allowed

Another interesting top-down hierarchical approach pro-posed by the same authors is HMC-LP (hierarchical multil-abel classification label-powerset) a hierarchical variation ofthe label-powerset nonhierarchical multilabel method [129]that has been applied to the prediction of gene function ofthe yeast model organism using 10 different data sets andthe FunCat taxonomy [124] According to the label-powersetapproach the method is based on a first label-combinationstep by which for each example (gene) all classes assignedto the example are combined into a new and unique classand this process is repeated for each level of the hierarchyIn this way the original problem is transformed into ahierarchical single-label problem In both the training andtest phases the top-down approach is applied and at theend of the classification phase the original classes can beeasily reconstructed [124] In an experimental comparisonusing the FunCat taxonomy for S cerevisiae results showedthat hierarchical top-down ensemble methods significantlyoutperform decision trees-based hierarchical methods butno significant difference between different flavors of top-down hierarchical ensembles has been detected [28]

Top-down algorithms can be conceived also in the con-text of network-basedmethods (HTD-NET) For instance in

[26] a probabilistic model that combines relational protein-protein interaction data and the hierarchical structure ofGO to predict true-path consistent function labels obeysthe true path rule by setting the descendants of a nodeas negative whenever that node is set to negative Moreprecisely the authors at first compute a local hierarchicalconditional probability in the sense that for any nonrootGO term only the parents affect its labeling This probabilityis computed within a network-based framework assumingthat the labeling of a gene is independent of any other genesgiven that of its neighbors (a sort of the Markov propertywith respect to gene functional interaction networks) andassuming also a binomial distribution for the number ofneighbors labeled with child terms with respect to thoselabeled with the parent term These assumptions are quitestringent but are necessary tomake themodel tractableThena global hierarchical conditional probability is computed byrecursively applying the previously computed local hierarchi-cal conditional probability by considering all the ancestorsMore precisely by assuming that P(119910119894 = 1 | 119892119873(119892)) that isthe probability that a gene 119892 is annotated for a a node 119894 giventhe status of the annotations of its neighborhood 119873(119892) inthe functional networks the global hierarchical conditionalprobability factorizes according to the GO graph as follows

P (119910119894 = 1 | 119892119873 (119892))

= prod

119895isinanc(119894)P (119910119895 = 1 | 119910par(119895) = 1119873loc (119892))

(6)

where119873loc(119892) represents the local hierarchical neighborhoodinformation on the parent-child GO term pair par(119895) and119895 [26] This approach guarantees to produce GO term labelassignments consistent with the hierarchy without the needof a postprocessing step

Finally in [130] the author applied a hierarchical methodto the classification of yeast FunCat categories Despite itswell-founded theoretical properties based on large margin

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 9

methods this approach is conceived for one path hierarchicalclassification and hence it results to be unsuited for hierarchi-cal AFP where usually multiple paths in the hierarchy shouldbe considered since in most cases genes can play differentfunctional roles in the cell

5 Ensemble Based Bayesian Approaches forHierarchical Classification

These methods introduce a Bayesian approach to the hierar-chical classification of proteins by using the classical Bayestheorem or Bayesian networks to obtain tractable factoriza-tions of the joint conditional probabilities from the originalldquofull Bayesianrdquo setting of the hierarchical AFP problem [2030] or to achieve ldquoBayes-optimalrdquo solutions with respect toloss functions well suited to hierarchical problems [27 92]

51The Solution Based on Bayesian Networks One of the firstapproaches addressing the issue of inconsistent predictions inthe Gene Ontology is represented by the Bayesian approachproposed in [20] (BAYES NET-ENS) According to thegeneral scheme of hierarchical ensemble methods two mainsteps characterize the algorithm

(1) flat prediction of each termclass (possibly inconsis-tent)

(2) Bayesian hierarchical combination scheme to allowcollaborative error-correction over all nodes

After training a set of base classifiers on each of the consid-eredGO terms (in their work the authors applied themethodto 105 selected GO terms) we may have a set of (possiblyinconsistent) y predictions The goal consists in finding aset of consistent y predictions by maximizing the followingequation derived from the Bayes theorem

P (1199101 119910119899 | 1199101 119910119899)

=P (1199101 119910119899 | 1199101 119910119899)P (1199101 119910119899)

119885

(7)

where 119899 is the number of GO nodesterms and119885 is a constantnormalization factor

Since the direct solution of (7) is too hard that isexponential in time with respect to the number of nodesthe authors proposed a Bayesian network structure to solvethis difficult problem in order to exploit the relationshipsbetween the GO terms More precisely to reduce the com-plexity of the problem the authors imposed the followingconstraints

(1) 119910119894 nodes conditioned to their children (GO structureconstraints)

(2) 119910119894 nodes conditioned on their label119910119894 (the Bayes rule)(3) 119910119894 are independent from both 119910119895 119894 = 119895 and 119910119895 119894 = 119895

given 119910119894

In other words we can ensure that a label is 1 (positive) whenany one of its children is 1 and the edges from 119910119894 to 119910119894 assure

y1

y2

y3 y4

y5

y1

y2

y3 y4

y5

Figure 4 Bayesian network involved in the hierarchical classifica-tion (adapted from [20])

that a classifier output 119910119894 is a random variable independent ofall other classifier outputs 119910119895 and labels 119910119895 given its true label119910119894 (Figure 4)

More precisely from the previous constraints we canderive the following equations

from the first constraint

P (1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | child (119910119894))(8)

from the last two constraints

P (1199101 119910119899 | 1199101 119910119899) =

119899

prod

119894=1

P (119910119894 | 119910119894)

(9)

Note that (8) can be inferred from training labels simplyby counting while (9) can be inferred by validation duringtraining by modeling the distribution of 119910119894 outputs overpositive and negative examples by assuming a parametricmodel (eg Gaussian distribution see Figure 5)

For the implementation of their method the authorsadopted bagged ensemble of SVMs [131] to make theirpredictions more robust and reliable at each node of the GOhierarchy and median values of their outputs on out-of-bagexamples have been used to estimate means and variancesfor each class Finally means and variances have been usedas parameters of the Gaussian models used to estimate theconditional probabilities of (9)

Results with the 105 termsnodes of the GO BP (modelorganism S cerevisiae) showed substantial improvementswith respect to nonhierarchical ldquoflatrdquo predictions the hierar-chical approach improves AUC results on 93 of the 105 GOterms (Figure 6)

52TheMarkov Blanket andApproximated Breadth First Solu-tion In [30] the authors proposed an alternative approxi-mated solution to the complex equation (7) by introducingthe following two variants of the Bayesian integration

(1) HIER-MB hierarchical Bayesian combination involv-ing nodes in the Markov blanket

(2) HIER-BFS hierarchical Bayesian combination invo-lving the 30 first nodes visited through a breadth-first-search (BFS) in the GO graph

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

10 ISRN Bioinformatics

1500

1000

500

0minus5 minus4 minus3 minus2 minus1 0 1 2

Negative examples

(a)

minus5 minus4 minus3 minus2 minus1 0 1 2

Positive examples20

15

10

5

0

(b)

Figure 5 Distribution of positive and negative validation examples (a Gaussian distribution is assumed) Adapted from [20]

42254

27 7046

6364

638530490

16070

16072

164

6396

6397

358

16071

6402

30036

7016

7010

7020

8283

310

282

283

278

82 88

7049

74

87

71 70

7067

67

7059

280

6260

6261

6270

7128

7131

6268

6259

6310

6318

102

6987

7001

6325

6281

6289

7124 1409

6950 6999

6974 6979 6970

6351 45445 16568

6990 6368 6355 6336

6069 6067 6057 6347 6349

122 45944

16579

19538

6461 6464 6457 6736 6412 30163

6513 8468 15885 6447 6414 6413 6508

4511

16182

45545 6897 48193

16081 6887 6893 6889 6891

6088 6906

6605

6806 6911 8623

6108 6809 6610 6607

Figure 6 Improvements induced by the hierarchical prediction of the GO terms Darker shades of blue indicate largest improvements anddarker shades of red indicate largest deterioration white means no change (adapted from [20])

Y1

Y2 Y3

Y4 Y5

Y6 Y7

Y1 SVM

Y2 SVM Y3 SVM

Y4 SVM Y5 SVM

Y6 SVM Y7 SVM

Figure 7 Markov blanket surrounding the GO term 1198841 Each GO

term is represented as a blank node while the SVM classifier outputis represented as a gray node (adapted from [30])

The method has been applied to the prediction of more than2000 GO terms for the mouse model organism and per-formed among the top methods in theMouseFunc challenge[2]

The first approach (HIER-MB) modifies the output of thebase learners (SVMs in the Guan et al paper) taking intoaccount the Bayesian network constructed using the Markovblanket surrounding the GO term of interest (Figure 7)In a Bayesian network the Markov blanket of a node 119894 isrepresented by its parents (par(119894)) its children (child(119894)) andits childrenrsquos other parents The Bayesian network involvingtheMarkov blanket of node 119894 is used to provide the prediction119910119894 of the ensemble thus leveraging the local relationshipsof node 119894 and the predictions for the nodes included in itsMarkov blanket

To enlarge the size of the Bayesian subnetwork involvedin the prediction of the node of interest a variant based on

Y1

Y2Y3

Y4 Y5 Y6

Y1 SVM

Y2 SVM Y2 SVM

Y4 SVM Y5 SVM Y6 SVM

Figure 8 The breadth-first subnetwork stemming from 1198841 Each

GO term is represented through a blank node and the SVM outputsare represented as gray nodes (adapted from [30])

the Bayesian networks constructed by applying a classicalbreadth-first search is the basis of the HIER-BFS algorithmTo reduce the complexity at most 30 terms are included (iethe first 30 nodes reached by the breadth-first algorithm seeFigure 8) In the implementation ensembles of 25 SVMs havebeen trained for each node using vector space integrationtechniques [132] to integrate multiple sources of data

Note that with both HIER-MB and HIER-BFS methodswe do not take into account the overall topology of the GOnetwork but only the terms related to the node for whichwe perform the prediction Even if this general approachis reasonable and achieves good results its main drawbackis represented by the locality of the hierarchical integration(limited to the Markov blanket and the first 30 BFS nodes)Moreover in previous works it has been shown that theadopted integration strategy (vector space integration) is

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 11

Data preprocessing

Gene expressionmicroarray

Protein sequence anddomain information

Protein-proteininteractions

Phenotype profiles

Phylogenetic profiles

Disease profiles (usinghomology)

Three classifiers

(1) Single SVM for GOterm X

(2) Hierarchicalcorrection

GO X SVM X

GO Y SVM Y

(3) Bayesian integration ofindividual datasets

GO X

Data 1 Data 2 Data 3

Ensemble

Held-out set validationto choose the best

classifier for each GOterm

True

pos

itive

rate

False positive rate

Final ensemblepredictions for GO

term X

Figure 9 Integration of diverse methods and diverse sources of data in an ensemble framework for AFP prediction The best classifier foreach GO term is selected through held-out set validation (adapted from [30])

in most cases worse than kernel fusion [11] and ensemblemethods for data integration [23]

In the same work [30] the authors propose also a sortof ldquotest and selectrdquo method [133] by which three differentclassification approaches (a) single flat SVMs (b) Bayesianhierarchical correction and (c) Naive Bayes combination areapplied and for each GO term the best one is selected byinternal cross-validation (Figure 9)

It is worth noting that other approaches adopted Bayesiannetworks to resolve the hierarchical constraints underlyingthe GO taxonomy For instance in the FALCON algorithmthe GO is modeled as a Bayesian network and for any giveninput the algorithm returns the most probable GO termassignment in accordance with the GO structure by using anevolutionary-based optimization algorithm [134]

53 HBAYES An ldquoOptimalrdquo Hierarchical Bayesian EnsembleApproach TheHBAYES ensemble method [92 135] is a gen-eral technique for solving hierarchical classification problemson generic taxonomies 119866 structured according to forest oftreesThemethod consists in training a calibrated classifier ateach node of the taxonomy In principle any algorithm (egsupport vector machines or artificial neural networks) whoseclassifications are obtained by thresholding a real prediction119901 for example 119910 = SGN(119901) can be used as base learner Thereal-valued outputs 119901119894(119892) of the calibrated classifier for node119894 on the gene 119892 are viewed as estimates of the probabilities119901119894(119892) = P(119910119894 = 1 | 119910par(119894) = 1 119892) The distribution of therandom Boolean vector Y is assumed to be

P (Y = y) =119898

prod

119894=1

P (119884119894 = 119910119894 | 119884par(119894) = 1 119892) forally isin 0 1119898

(10)

where in order to enforce that only multilabelsY that respectthe hierarchy have nonzero probability it is imposed that

P(119884119894 = 1 | 119884par(119894) = 0 119892) = 0 for all nodes 119894 = 1 119898 and all119892 This implies that the base learner at node 119894 is only trainedon the subset of the training set including all examples (119892 y)such that 119910par(119894) = 1

531 HBAYES Ensembles for Protein Function Prediction H-loss is a measure of discrepancy between multilabels basedon a simple intuition if a parent class has been predictedwrongly then errors in its descendants should not be takeninto account Given fixed cost coefficients 1205791 120579119898 gt 0 theH-loss ℓ119867(y v) between multilabels y and v is computed asfollows all paths in the taxonomy 119879 from the root down toeach leaf are examined and whenever a node 119894 isin 1 119898

is encountered such that 119910119894 = V119894 120579119894 is added to the loss whileall the other loss contributions from the subtree rooted at 119894are discarded This method assumes that given a gene 119892 thedistribution of the labels V = (1198811 119881119898) is P(V = v) =

prod119898

119894=1119901119894(119892) for all v isin 0 1

119898 where 119901119894(119892) = P(119881119894 = V119894 |119881par(119894) = 1 119892) According to the true path rule it is imposedthat P(119881119894 = 1 | 119881par(119894) = 0 119892) = 0 for all nodes 119894 and all genes119892

In the evaluation phase HBAYES predicts the Bayes-optimal multilabel y isin 0 1

119898 for a gene 119892 based on theestimates 119901119894(119892) for 119894 = 1 119898 By definition of Bayes-optimality the optimal multilabel for 119892 is the one thatminimizes the loss when the true multilabel V is drawnfrom the joint distribution computed from the estimatedconditionals 119901119894(119892) That is

y = argminyisin01119898

E [ℓ119867 (yV) | 119892] (11)

In other words the ensemble method HBAYES provides anapproximation of the optimal Bayesian classifier with respectto the H-loss [135] More precisely as shown in [27] thefollowing theorem holds

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

12 ISRN Bioinformatics

Theorem 1 For any tree119879 and gene 119892 themultilabel generatedaccording to the HBAYES prediction rule is the Bayes-optimalclassification of 119892 for the H-loss

In the evaluation phase the uniform cost coefficients120579119894 = 1 for 119894 = 1 119898 are used However since withuniform coefficients the H-loss can be made small simplyby predicting sparse multilabels (ie multilabels y such thatsum119894119910119894 is small) in the training phase the cost coefficients are

set to 120579119894 = 1|root(119866)| if 119894 isin root(119866) and to 120579119894 = 120579119895|child(119895)|with 119895 = par(119894) if otherwise This normalizes the H-loss inthe sense that the maximal H-loss contribution of all nodesin a subtree excluding its root equals that of its root

Let 119864 be the indicator function of event 119864 Given 119892

and the estimates 119901119894 = 119901119894(119892) for 119894 = 1 119898 the HBAYESprediction rule can be formulated as follows

HBAYES Prediction Rule Initially set the labels of each node119894 to

119910119894 = argmin119910isin01

(120579119894119901119894 (1 minus 119910) + 120579119894 (1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(12)

where

119867119895 (y) = 120579119895119901119895 (1 minus 119910119895) + 120579119895 (1 minus 119901119895) 119910119895

+ 119901119895 119910119895 = 1 sum

119896isinchild(119895)

119867119896 (y)(13)

is recursively defined over the nodes 119895 in the subtree rootedat 119894 with each 119910119895 set according to (12)

Then if 119910119894 is set to zero set all nodes in the subtree rootedat 119894 to zero as well

It is worth noting that y can be computed for a given 119892

via a simple bottom-up message-passing procedure It can beshown that if all child nodes 119896 of 119894 have 119901119896 close to a half thenthe Bayes-optimal label of 119894 tends to be 0 irrespective of thevalue of 119901119894 On the contrary if 119894rsquos children all have 119901119896 close toeither 0 or 1 then the Bayes-optimal label of 119894 is based on 119901119894

only ignoring the children This behaviour can be intuitivelyexplained in the following way the estimate 119901119896 is built basedonly on the examples on which the parent 119894 of 119896 is positivehence a ldquoneutralrdquo estimate 119901119896 = 12 signals that the currentinstance is a negative example for the parent 119894 Experimentalresults show that this approach achieves comparable resultswith the TPR method (Section 7) an ensemble approachbased on the ldquotrue path rulerdquo [136]

532 HBAYES-CSTheCost-Sensitive Version TheHBAYES-CS is the cost-sensitive version of HBAYES proposed in[27] By this approach the misclassification cost coefficient120579119894 for node 119894 is split into two terms 120579+

119894and 120579

minus

119894for taking

into account misclassifications respectively for positive and

negative examples By considering separately these two terms(12) can be rewritten as

119910119894 = argmin119910isin01

(120579minus

119894119901119894 (1 minus 119910) + 120579

+

119894(1 minus 119901119894) 119910

+119901119894 119910 = 1 sum

119895isinchild(119894)119867119895 (y))

(14)

where the expression of119867119895(119910) gets changed correspondinglyBy introducing a factor 120572 ge 0 such that 120579minus

119894= 120572120579

+

119894while

keeping 120579+

119894+ 120579minus

119894= 2120579119894 the relative costs of false positives

and false negatives can be parameterized thus allowing us tofurther rewrite the hierarchical Bayesian rule (Section 531)as follows

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572 (15)

By setting 120572 = 1 we obtain the original version of thehierarchical Bayesian ensemble and by incrementing 120572 weintroduce progressively lower costs for positive predictionsIn this way we can obtain that the recall of the ensemble tendsto increase eventually at the expenses of the precision and bytuning the 120572 parameter we can obtain different combinationsof precisionrecall values

In principle a cost factor 120572119894 can be set for each node119894 to explicitly take into account the unbalance between thenumber of positive 119899+

119894and negative 119899minus

119894examples estimated

from the training data

120572119894 =119899minus

119894

119899+

119894

997904rArr 120579+

119894=

2

119899minus

119894119899+

119894+ 1

120579119894 =2119899+

119894

119899minus

119894+ 119899+

119894

120579119894 (16)

The decision rule (15) at each node then becomes

119910119894 = 1 lArrrArr 119901119894(2120579119894 minus sum

119895isinchild(119894)119867119895) ge

2120579119894

1 + 120572119894

=2120579119894119899+

119894

119899minus

119894+ 119899+

119894

(17)

Results obtained with the yeast model organism showedthat HBAYES-CS significantly outperform HTD methods[27 136]

6 Reconciliation Methods

Hierarchical ensemble methods are basically two-step meth-ods since at first provide predictions for the single classesand then arrange these predictions to take into accountthe functional relationships between GO terms Noble andcolleagues name this general approach reconciliationmethods[24] they proposed methods for calibrating and combiningindependent predictions to obtain a set of probabilistic pre-dictions that are consistent with the topology of the ontologyThey applied their ensemble methods to the genome-wideand ontology-wide function prediction with M musculusinvolving about 3000 GO terms

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 13

(2) S

VM

s(1

) Ker

nels

For each term

MGI PPI Pfam

K1 K2 K3 K30

SVM

SVM

SVM

SVM

(3) Calibration

Probability score p2

p1 le p2 p3 le p4p2 le p5

Y1

Y2 Y3

Y4Y5

Ontology

(4) Reconciliation

middot middot middot

middot middot middot

middot middot middot

Figure 10 The overall scheme of reconciliation methods (adaptedfrom [24])

Their goal consists in providing consistent predictionsthat is predictions whose confidence (eg posterior prob-ability) increases as we ascend from more specific to moregeneral terms in the GO Moreover another important issueof these methods is the availability of confidence valuesassociated with the predictions that can be interpreted asprobabilities that a protein has a certain function given theinformation provided by the data

The overall reconciliation approach can be summarizedin the following four basic steps (Figure 10)

(1) Kernel computation at first a set of kernels is com-puted from the available data We may choose kernelspecific for each source of data (eg diffusion kernelsfor protein-protein interaction data [137] linear orGaussian kernel for expression data and string kernelfor sequence data [138])Multiple kernels for the sametype of data can also be constructed [24]

(2) SVM learning SVMs are used as base learners usingthe kernels selected at the previous step the training isperformed by internal cross-validation to avoid over-fitting and a local cost-sensitive strategy is appliedby tuning separately the 119862 regularization factor forpositive and negative examples Note that the authorsin their experiments used SVMs as base learnersbut any meaningful classifier could be used at thisstep

(3) Calibration to produce individual probabilistic out-puts from the set of SVM outputs correspondingto one GO term a logistic regression approach isapplied In this way a calibration of the individualSVM outputs is obtained resulting in a probabilis-tic prediction of the random variable 119884119894 for eachnodeterm 119894 of the hierarchy given the outputs 119910119894 ofthe SVM classifiers

(4) Reconciliation the first three steps generate unrec-onciled outputs that is in practice a ldquoflatrdquo ensembleis applied that may generate inconsistent predictionswith respect to the given taxonomy In this step the

outputs of step three are processed by a ldquoreconcili-ation methodrdquo The goal of this stage is to combinepredictions for each term to produce predictions thatare consistent with the ontology meaning that allthe probabilities assigned to the ancestors of a GOterm are larger than the probability assigned to thatterm

The first three steps are basically the same for (or verysimilar to) each reconciliation ensemble methodThe crucialstep is represented by the fourth that is the reconciliationstep and different ensemble algorithms can be designed toimplement it The authors proposed 11 different ensemblemethods for the reconciliation of the base classifier outputsSchematically they can be subdivided into the following fourmain classes of ensembles

(1) heuristic methods(2) Bayesian network-based methods(3) cascaded logistic regression(4) projection-based methods

61 Heuristic Methods These approaches preserve the ldquorec-onciliation propertyrdquo

forall119894 119895 isin 119866 (119894 119895) isin 119866 997904rArr 119901119894 ge 119901119895 (18)

through simple heuristic modifications of the probabilitiescomputed at step 3 of the overall reconciliation scheme

(i) The MAX method simply chooses the largest logisticregression value for the node 119894 and all its descendantsdesc

119901119894 = max119895isindesc(119894)

119901119895 (19)

(ii) The AND method implements the notion that theprobability of all ancestral GO terms anc(119894) of a giventermnode 119894 is large assuming that conditional on thedata all predictions are independent

119901119894 = prod

119895isinanc(119894)119901119895 (20)

(iii) OR estimates the probability that the node 119894 isannotated at least for one of the descendant GOterms assuming again that conditional on the dataall predictions are independent

1 minus 119901119894 = prod

119895isindesc(119894)(1 minus 119901119895) (21)

62 Cascaded Logistic Regression Instead of modelingclass-conditional probabilities as required by the Bayesianapproach logistic regression can be used instead to directlymodel posterior probabilities Considering that modelingconditional densities are in most cases difficult (also usingstrong independence assumptions as shown in Section 51)

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

14 ISRN Bioinformatics

the choice of logistic regression could be a reasonable one In[24] the authors embedded in the logistic regression settingthe hierarchical dependencies between terms By assumingthat a random variable X whose values represent the featuresof the gene 119892 of interest is associated to a given gene 119892 andassuming that P(Y = y | X = x) factorizes according to theGO graph then it follows

P (Y = y | X = x)

= prod

119894

P (119884119894 = 119910119894 | forall119895 isin par (119894) 119884119895 = 119910119895 119883119894 = 119909119894)(22)

with P(119884119894 = 1 | forall119895 isin par(119894) 119884119895 = 0119883119894 = 119909119894) = 0 Theauthors estimatedP(119884119894 = 1 | forall119895 isin par(119894)119884119895 = 1119883119894 = 119909119894)withlogistic regression This approach is quite similar to fittingindependent logistic regressions but note that in this caseonly examples of proteins having all parents GO terms areused to fit the model thus implicitly taking into account thehierarchical relationships between GO terms

63 Bayesian Network-Based Methods These methods arevariants of the Bayesian network approach proposed in [20](Section 51) the GO is viewed as a graphical model where ajoint Bayesian prior is put on the binary GO term variables 119910119894The authors proposed four variants that can be summarizedas follows

(i) the BPAL is a belief propagation approach withasymmetric Laplace likelihoodsThe graphical modelhas edges directed from more general terms to morespecific terms Differently from [20] the distributionof each SVM output is modeled as an asymmetricLaplace distribution and a variational inference algo-rithm that solves an optimization problem whoseminimizer is the set of marginal probabilities ofthe distribution is used to estimate the posteriorprobabilities of the ensemble [139]

(ii) the BPALF approach is similar to BPAL butwith edgesinverted and directed from more specific terms tomore general terms

(iii) the BPLR is a heuristic variant of BPAL where in theinference algorithm the Bayesian log posterior ratiofor 119884119894 is replaced by the marginal log posterior ratioobtained from the logistic regression (LR)

(iv) The BPLRF is equal to BPLR but with reversed edges

64 Projection-Based Methods A different approach is rep-resented by methods that directly use the calibrated valuesobtained from logistic regression (step 3 of the overall schemeof the reconciliation methods) to find the closest set ofvalues that are consistent with the ontology This approachleads to a constrained optimization problem The maincontribution of the Obozinski et al work [24] is representedby the introduction of projection reconciliation techniquesbased on isotonic regression [140] and the Kullback-Leiblerdivergence

The isotonic regression method tries to find a set ofmarginal probabilities 119901119894 that are close to the set of calibrated

values 119901119894 obtained from the logistic regressionThe Euclideandistance is used as ameasure of closeness Hence consideringthat the ldquoreconciliation propertyrdquo requires that 119901119894 ge 119901119895

when (119894 119895) isin 119864 this approach yields the following quadraticprogram

min119901119894119894isin119868

sum

119894isin119868

(119901119894 minus 119901119894)2

st 119901119895 le 119901119894 (119894 119895) isin 119864

(23)

This problem is the classical isotonic regression problemsthat can be solved using an interior point solver or alsoapproximated algorithm when the number of edges of thegraph is too large [141]

Considering that we deal with probabilities a naturalmeasure of distance between probability density functions119891(x) and 119892(x) defined with respect to a random variable xis represented by the Kullback-Leibler divergence 119863119891x119892x asfollows

119863119891x119892x= int

infin

minusinfin

119891 (x) log(119891 (x)119892 (x)

) 119889x (24)

In the context of reconciliationmethods we need to considera discrete version of theKullback-Leibler divergence yieldingthe following optimization problem

minp

119863pp = min119901119894119894isin119868

sum

119894isin119868

119901119894 log(119901119894

119901119894

)

st 119901119895 le 119901119894 (119894 119895) isin 119864

(25)

The algorithm finds the probabilities closest to the prob-abilities p obtained from logistic regression according tothe Kullback-Leibler divergence and obeying the constraintsthat probabilities cannot increase while descending on thehierarchy underlying the ontology

The extensive experiments exploited in [24] show thatamong the reconciliation methods isotonic regression is themost generally useful Across a range of evaluation modesterm sizes ontologies and recall levels isotonic regressionyields consistently high precisionOn the other hand isotonicregression is not always the ldquobest methodrdquo and a biologistwith a particular goal in mindmay apply other reconciliationmethods For instance with small terms usually Kullback-Leibler projections achieve the best results but consideringaverage ldquoper termrdquo results heuristicmethods yield precision ata given recall comparable with projectionmethods and betterthan that achieved with Bayes-net methods

This ensemble approach achieved excellent results in theprediction of protein function in the mouse model organismdemonstrating that hierarchical multilabel methods can playa crucial role for the improvement of protein function pre-diction performances [24] Nevertheless the approach suffersfrom some drawbacks Indeed the paper focuses on thecomparison of hierarchical multilabel methods but it doesnot analyze impact of the concurrent use of data integrationand hierarchical multilabel methods on the overall classifica-tion performances Moreover potential improvements could

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 15: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 15

01

0101

010103 010106 010109 010113

0102

010207

0103

010301 010304

0104

010502 010525

0106

010602 010605 010610

0107

010701

0120

010316 010503 010513 010606

0105

01010302 01010605 01010906 01030103 01031601 01050204 01060201 010606110105020701010305

Posit

ive Negative

Figure 11 The asymmetric flow of information suggested by the true path rule

be introduced by applying cost-sensitive variants of hierar-chical multilabel predictors able to effectively calibrate theprecisionrecall trade-off at different levels of the functionalontology

7 True Path Rule Hierarchical Ensembles

These ensemble methods exploit at the same time thedownward and upward relationships between classes thusconsidering both the parent-to-child and child-to-parentfunctional links (Figure 2(b))

The true path rule (TPR) ensemble method [31 142] isdirectly inspired by the true path rule that governs both GOand FunCat taxonomies Citing the curators of the GeneOntology is as follows [143] ldquoAn annotation for a class in thehierarchy is automatically transferred to its ancestors whilegenes unannotated for a class cannot be annotated for itsdescendantsrdquo Considering the parents of a given node 119894 aclassifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 997904rArr 119910par(119894) = 1

119910119894 = 0 999489999513999457 119910par(119894) = 0

(26)

On the other hand considering the children of a given node119894 a classifier that respects the true path rule needs to obey thefollowing rules

119910119894 = 1 999489999513999457 119910child(119894) = 1

119910119894 = 0 997904rArr 119910child(119894) = 0

(27)

From (26) and (27) we observe an asymmetry in the rulesthat govern the assignments of positive and negative labels

Indeed we have a propagation of positive predictions frombottom to top of the hierarchy in (26) and a propagationof negative labels from top to bottom in (27) Converselynegative labels cannot propagate from bottom to top andpositive predictions cannot propagate from top to bottom

The ldquotrue path rulerdquo suggests algorithms able to propagateldquopositiverdquo decisions from bottom to top of the hierarchy andnegative decisions from top to bottom (Figure 11)

71 The True Path Rule Ensemble Algorithm The TPR algo-rithm puts together the predictions made at each node bylocal ldquobaserdquo classifiers to realize an ensemble that obeys theldquotrue path rulerdquo

The basic ideas behind the method can be summarized asfollows

(1) training of the base learners for each node of the hier-archy a suitable learning algorithm (eg a multilayerperceptron or a support vector machine) provides aclassifier for the associated functional class

(2) in the evaluation phase the trained classifiers associ-ated with each classnode of the graph provide a localdecision about the assignment of a given example toa given node

(3) positive decisions that is annotations to a specificfunctional class may propagate from bottom to topacross the graph they influence the decisions of theparent nodes and of their ancestors in a recursiveway by traversing the graph towards higher levelnodesclasses Conversely negative decisions do notaffect decisions of the parent nodethat is they do notpropagate from bottom to top (26)

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 16: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

16 ISRN Bioinformatics

(4) negative predictions for a given node (taking intoaccount the local decision of its descendants) arepropagated to the descendants to preserve the consis-tency of the hierarchy according to the true path rulewhile positive decisions do not influence decisions ofchild nodes (27)

The ensemble combines the local predictions of the baselearners associatedwith each nodewith the positive decisionsthat come from the bottom of the hierarchy and with thenegative decisions that spring from the higher level nodesMore precisely base classifiers estimate local probabilities119901119894(119892) that a given example 119892 belongs to class 120579119894 but the coreof the algorithm is represented by the evaluation phase wherethe ensemble provides an estimate of the ldquoconsensusrdquo globalprobability 119901

119894(119892)

It is worth noting that instead of a probability 119901119894(119892)mayrepresent a score associated with the likelihood that a givengenegene product belongs to the functional class 119894

Let us consider the set 120601119894(119892) of the children of node 119894 forwhich we have a positive prediction for a given gene 119892

120601119894 (119892) = 119895 119895 isin child (119894) 119910119895 = 1 (28)

The global consensus probability 119901119894(119892) of the ensemble

depends both on the local prediction 119901119894(119892) and on theprediction of the nodes belonging to 120601119894(119892)

119901119894(119892) =

1

1 +1003816100381610038161003816120601119894 (119892)

1003816100381610038161003816

(119901119894 (119892) + sum

119895isin120601119894(119892)

119901119895(119892)) (29)

The decision 119910119894(119892) at nodeclass 119894 is set to 1 if 119901119894(119892) gt 119905

and to 0 if otherwise (a natural choice for 119905 is 05) andonly children nodes for which we have a positive predictioncan influence their parent In the leaf nodes the sum of(29) disappears and (29) becomes 119901

119894(119892) = 119901119894(119892) In this

way positive predictions propagate from bottom to top andnegative decisions are propagated to their descendants whenfor a given node 119910119894(119892) = 0

The bottom-up per level traversal of the tree assures thatall the offsprings of a given node 119894 are taken into account forthe ensemble prediction For the same reason we can safelyset the classes belonging to the subtree rooted at 119894 to negativewhen 119910119894 is set to 0 It is worth noting that we have a two-way asymmetric flow of information across the tree positivepredictions for a node influence its ancestors while negativepredictions influence its offsprings

The algorithm provides both the multilabels 119910119894 and anestimate of the probabilities119901

119894that a given example 119892 belongs

to the class 119894 = 1 119898

72 The Cost-Sensitive Variant Note that in the TPR algo-rithm there is noway to explicitly balance the local prediction119901119894(119892) at node 119894 with the positive predictions coming fromits offsprings (29) By balancing the local predictions withthe positive predictions coming from the ensemble we canexplicitly modulate the interplay between local and descen-dant predictors To this end a weight 119908 0 le 119908 le 1 isintroduced such that if 119908 = 1 the decision at node 119894 depends

only by the local predictor otherwise the prediction is sharedproportionally to119908 and 1minus119908 between respectively the localparent predictor and the set of its children

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

119901119895 (30)

This variant of the TPR algorithm is the weighted truepath rule (TPR-W) hierarchical ensemble algorithm Bytuning the119908 parameter we canmodulate the precisionrecallcharacteristics of the resulting ensemble More precisely for119908 rarr 0 the weight of the parent local predictor is smalland the ensemble decision mainly depends on the positivepredictions of the offsprings nodes (classifiers) Conversely119908 rarr 1 corresponds to a higher weight of the parent pre-dictor then less weight is given to possible positive predic-tions of the children and the decision depends mainly on thelocalparent base classifier In case of a negative decision allthe subtree is set to 0 causing the precision to increase Notethat for 119908 rarr 1 the behaviour of TPR-W becomes similar tothat of HTD (Section 4)

A specific advantage of the TPR-W ensembles is thecapability of tuning precision and recall rates through theparameter 119908 (30) For small values of 119908 the weight ofthe decision of the parent local predictor is small and theensemble decision dependsmainly by the positive predictionsof the offsprings nodes (classifiers) and higher values of 119908correspond to a higher weight of the ldquoparentrdquo local predictorwith a resulting higher precision In [31] the author showsthat the 119908 parameter highly influences the precisionrecallcharacteristics of the ensemble low 119908 values yield a higherrecall while high values improve the precision of the TPR-Wensemble

Recently Chen and Hu proposed a method that appliesthe TPR-W hierarchical strategy but using composite kernelSVMs as base classifiers and a supervised clustering withoversampling strategy to solve the imbalance data set learningproblem showed that the proper selection of base learnersand unbalance-aware learning strategies can further improvethe results in terms of hierarchical precision and recall [144]

The same authors proposed also an enhanced versionof the TPR-W strategy to overcome a limitation of thisbottom-up hierarchical method for AFP Indeed for someclasses at the lower levels of the hierarchy the classifierperformances are sometimes quite poor due to both noisydata and the relatively low number of available annotationsMore precisely in the basic TPR ensemble the probabilities119901119895computed by the children of the node 119894 (30) contribute in

equal way to the probability 119901119894computed by the ensemble at

node 119894 independently of the accuracy of the predictionsmadeby its children classifiers This ldquounweightedrdquo mechanismmaygenerate error propagation of the errors across the hierarchya poor performance child classifier may for instance withhigh probability predict a negative example as positive andthis error may propagate to its parent node and recursively toits ancestor nodes To try to alleviate this possible bottom-up error propagation in [145] Chen and Hu proposed animproved TPR ensemble (TPR-W weighted) based on classi-fier performance To this end they weighted the contribution

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 17: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 17

of each child classifier on the basis of their performanceevaluated on a validation data set by adding to (30) anotherweight ]119895

119901119894= 119908119901119894 +

1 minus 119908

10038161003816100381610038161206011198941003816100381610038161003816

sum

119895isin120601119894

]119895 sdot 119901119895 (31)

where ]119895 is computed on the basis of some accuracymetric119860119895(eg the F-score) estimated for the child classifiers associatedwith node 119895 as follows

]119895 =119860119895

sum119896isin120601119894

119860119896

(32)

In this way the contribution of ldquopoorrdquo classifier is reducedwhile ldquogoodrdquo classifiers weight more in the final computationof 119901119894(31) Experiments with the ldquoProtein Faterdquo subtree of

the FunCat taxonomy with the yeast model organism showthat this approach improves prediction with respect to theldquovanillardquo TPR-W hierarchical strategy [145]

73 Advantages and Drawbacks of TPR Methods While thepropagation of negative decisions from top to bottom nodesis quite straightforward and common to the hierarchical top-down algorithm the propagation of positive decisions frombottom to top nodes of the hierarchy is specific to the TPRalgorithm For a discussion of this item see Appendix C

Experimental results show that TPR-W achieves equalor better results than the TPR and top-down hierarchicalstrategy and both hierarchical strategies achieve significantlybetter results than flat classification methods [55 123] Theanalysis of the per level classification performances showsthat TPR-W by exploiting a global strategy of classificationis able to achieve a good compromise between precision andrecall enhancing the F-measure at each level of the taxonomy[31]

Another advantage of TPR-W consists in the possibilityof tuning precision and recall by using a global strategy largevalues of the 119908 parameter improve the precision and smallvalues improve the recall

Moreover TPR and TPR-W ensembles provide also aprobabilistic estimate of the prediction reliability for eachfunctional class of the overall taxonomy

The decisions performed at each node of the hierarchicalensemble are influenced by the positive decisions of itsdescendants More precisely the analyses performed in [31]showed the following

(i) weights of descendants decrease exponentially withrespect to their depth As a consequence the influenceof descendant nodes decay quickly with their depth

(ii) the parameter 119908 plays a central role in balancing theweight of the parent classifier associated with a givennode with the weights of its positive offsprings smallvalues of 119908 increase the weight of descendant nodesand large values increase theweight of the local parentpredictor associated with that node

(iii) the effect on the overall probability predicted by theensemble is the result of the choice of the119908parameter

the strength of the prediction of the local learners andof its descendants

These characteristics of TPR-W ensembles are well suitedfor the hierarchical classification of protein functions con-sidering that annotations of deeper nodes are likely to haveless experimental evidence than higher nodes Moreover byenforcing the strength of the descendant nodes through low119908

values we can improve the recall characteristics of the overallsystem (at the expense of a possible reduction in precision)

Unfortunately the method has been conceived andapplied only to the FunCat taxonomy structured accordingto a tree forest (Section 11) while no applications have beenperformed using the GO structured according to a directedacyclic graph (Section 11)

8 Ensembles Based on Decision Trees

Another interesting research line is represented by hierarchi-cal methods base on inductive decision trees [146] The firstattempts to exploit the hierarchical structure of functionalontologies for AFP simply used different decision treemodelsfor each level of the hierarchy [147] or investigated amodifieddecision tree model in which the assignment to a nodeis propagated toward the parent nodes [9] by extendingthe classical C45 decision tree algorithm for multiclassclassification

In the context of the predictive clustering tree framework[148] Blockeel et al proposed an improved version whichthey applied to the prediction of gene function in the yeast[149]

More recent approaches always based on modified deci-sion trees used distance measure derived from the hierar-chy and significantly improved previous methods [54] Theauthors showed that separate decision tree models are lessaccurate than a single decision tree trained to predict allclasses at once even when they are built taking into accountthe hierarchy

Nevertheless the previously proposed decision tree-basemethods often achieve results not comparable with state-of-the-art hierarchical ensemble methods To overcome thislimitation Schietgat et al showed that ensembles of hierar-chical multilabel decision trees are competitive with state-of-the-art statistical learning methods for DAG-structuredprediction of protein function in S cerevisiae A thalianaand M musculus model organisms [29] A further workexplored the suitability of different ensemble methods basedon predictive clustering trees ranging from global ensemblesthat learn ensembles of predictivemodels each able to predictthe entire structure of the hierarchy (ie all the GO termsfor a given gene) to local ensembles that train an entireensemble as a classifier for each branch of the taxonomyRecently a novel approach used PPI network autocorrelationin hierarchical multilabel classification trees to improve genefunction prediction [150]

In [151] methods related to decision trees in the sensethat interpretable classification rules to predict all functionsat all levels of the GOhierarchy have been proposed using an

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 18: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

18 ISRN Bioinformatics

ant colony optimization classification algorithm to discoverclassification rules

Finally bagging and random forest ensembles [152] havebeen applied to the AFP in yeast showing that both local andglobal hierarchical ensemble approaches perform better thanthe single model counterparts in terms of predictive power[153]

9 The Hierarchical Classification AloneIs Not Enough

Several works showed that in protein function predictionproblems we need to consider several learning issues [1 1618] In particular in [80] the authors showed that even ifhierarchical ensemble methods are fundamental to improvethe accuracy of the predictions their mere application is notenough to assure state-of-the-art results if we at the same timedo not consider other important learning issues related toAFP Indeed in [123] it has been shown a significant synergybetween hierarchical classification data integrationmethodsand cost-sensitive techniques highlighting that hierarchicalensemble methods should be designed taking into accountdifferent learning issues essential for the AFP problem

91 Hierarchical Methods and Data Integration Severalworks and the recently published results of the CAFA 2011(Critical Assessment of Functional Annotation) challengeshowed that data integration plays a central role to improvethe predictions of protein functions [16 25 154ndash156]

Indeed high-throughput biotechnologies make increas-ing quantities of biomolecular data of different types avail-able and several works pointed out that data integration isfundamental to improve the accuracy in AFP [1]

According to [154] we may subdivide the mainapproaches to data integration for AFP in four groupsas follows

(1) vector subspace integration(2) functional association networks integration(3) kernel fusion(4) ensemble methods

Vector Space Integration This approach consists in con-catenating vectorial data to combine different sources ofbiomolecular data [132] For instance [22] concatenatesdifferent vectors each one corresponding to a different sourceof genomic data in order to obtain a larger vector thatis used to train a standard SVM A similar approach hasbeen proposed by [30] but each data source is separatelynormalized in order to take into account the data distributionin each individual vector space

Functional Association Networks Integration In functionalassociation networks different graphs are combined to obtainthe composite resulting network [21 48] The simplestapproaches adopt conjunctivedisjunctive techniques [63]that is respectively adding an edge when in all the networks

two genes are linked together or when a link between thetwo genes is present in at least one functional network orprobabilistic evidence integration schemes [45]

Other methods differentially weight each data sourceusing techniques ranging from Gaussian random fields [46]to the naive-Bayes integration [157] and constrained linearregression [14] or by merging data taking into account theGO hierarchy [158] or by applying XML-based techniques[159]

Kernel FusionThese techniques at first construct a separatedGrammatrix for each available data source using appropriatekernels representing similarities between genesgene prod-ucts Then by exploiting the closure property with respect tothe sum and other algebraic operators the Grammatrices arecombined to obtain a ldquoconsensusrdquo global integrated matrix

Besides combining kernels linearly with fixed coefficients[22] one may also use semidefinite programming to learnthe coefficients [11] As methods based on semidefiniteprogramming do not scale well to multiple data sourcesmore efficient methods for multiple kernel learning havebeen recently proposed [160 161] Kernel fusion methodsboth with and without weighting the data sources havebeen successfully applied to the classification of proteinfunctions [162ndash165] Recently a novel method proposed anenhanced kernel integration approach by which the weightsare iteratively optimized by reducing the empirical loss ofa multilabel classifier for each of the labels simultaneouslyusing a combined objective function [165]

Ensemble Methods Genomic data fusion can be realized bymeans of an ensemble system composed by learners trainedon different ldquoviewsrdquo of the data and then combining theoutputs of the component learners Each type of data maycapture different and complementary characteristics of theobjects to be classified and the resulting ensemble may obtainbetter prediction capabilities through the diversity and theanticorrelation of the base learner responses

Some examples of ensemble methods for data combina-tion include ldquolate integrationrdquo of kernels trained on differentsources [22] the naive-Bayes integration [166] of the outputsof SVMs trained with multiple sources [30] and logisticregression for combining the output of several SVMs trainedwith different biomolecular data and kernels [24]

Recently in [23] the authors showed that simple ensem-ble methods such as weighted voting [167 168] or decisiontemplates [169] give results comparable to state-of-the-artdata integration methods exploiting at the same time themodularity and scalability that characterize most ensemblealgorithms Anotherwork showed that ensemblemethods arealso resistant to noise [170]

Using an ensemble approach biomolecular data differingin their structural characteristics (eg sequences vectorsand graphs) can be easily integrated because with ensemblemethods the integration is performed at the decision levelcombining the outputs produced by classifiers trained ondifferent datasets [171ndash173]

As an example of the effectiveness of the integration ofhierarchical ensemble methods with data fusion techniques

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 19: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 19

Table 2 Comparison of the results (average per class 119865-scores) achieved with single sources and multisource (data fusion) techniquesFLAT HTD HTD-CS HB (HBAYES) HB-CS (HBAYES-CS) TPR and TPR-W ensemble methods are compared with and without dataintegration In the last row the number in parentheses refers to the percentage relative increment in 119865-score performance achieved with datafusion techniques with respect to the best single source of evidence (BIOGRID)

Methods FLAT HTD HTD-CS HB HB-CS TPR TPR-WSingle-source

BIOGRID 02643 03759 04160 03385 04183 03902 04367

String 02203 02677 03135 02138 03007 02801 03048

PFAM BINARY 01756 02003 02482 01468 02407 02532 02738

PFAM LOGE 02044 01567 02541 00997 02847 03005 03160

Expr 01884 02506 02889 02006 02781 02723 03053

Seq sim 01870 02532 02899 02017 02825 02742 03088

Multisource (data fusion)Kernel fusion 03220(22) 05401(44) 05492(32) 05181(53) 05505(32) 05034(29) 05592(28)

in [123] six different sources of yeast biomolecular data havebeen integrated ranging from protein domain data (PFAMBINARY and PFAM LOGE) [174] gene expression mea-sures (EXPR) [175] predicted and experimentally supportedprotein-protein interaction data (STRING and BIOGRID)[176 177] to pairwise sequence similarity data (SEQ SIM)Kernel fusion integration (sum of the Gram matrices) hasbeen applied and preprocessing has been performed usingthe the HCGene R package [178]

Table 2 summarizes the results of the comparison acrossabout two hundreds of FunCat classes including single-source and data integration approaches together with bothflat and hierarchical ensembles

Data fusion techniques improve average per class F-scoreacross classes in flat ensembles (first column of Table 2) andsignificantly boost multilabel hierarchical methods (columnsHTD HTD-CS HB HB-CS TPR and TPR-W of Table 2)

Figure 12 depicts the classes (black nodes) where kernelfusion achieves better results than the best single-source dataset (BIOGRID) It is worth noting that the number of blacknodes is significantly larger in TPR-W (Figure 12(b)) withrespect to FLAT methods (Figure 12(a))

Hierarchical multilabel ensembles largely outperformFLAT approaches [24 30] but Table 2 and Figure 12 alsoreveal a synergy between hierarchical ensemble methods anddata fusion techniques

92 Hierarchical Methods and Cost-Sensitive TechniquesAccording to [27 142] cost-sensitive approaches boost pre-dictions of hierarchical methods when single-sources of dataare used to train the base learnersThese results are confirmedwhen cost-sensitive methods (HBAYES-CS Section 532HTD-CS Section 4 and TPR-W Section 72) are integratedwith data fusion techniques showing a synergy betweenmultilabel hierarchical data fusion (in particular kernelfusion) and cost-sensitive approaches (Figure 13) [123]

Perlevel analysis of the F-score in HBAYES-CS HTD-CS and TPR-W ensembles shows a certain degradation ofperformance with respect to the depth of nodes but thisdegradation is significantly lower when data fusion is appliedIndeed the per-level F-score achieved by HBAYES-CS and

HTD-CS when a single source is used consistently decreasesfrom the top to the bottom level and it is halved at level5 with respect to the first level On the other hand in theexperiments with Kernel Fusion the average F-score at level2 3 and 4 is comparable and the decrement at level 5 withrespect to level 1 is only about 15 (Figure 14) Similar resultsare reported also with TPR-W ensembles

In conclusion the synergic effects of hierarchical multi-label ensembles cost-sensitive and data fusion techniquessignificantly improve the performance of AFP Moreoverthese enhancements allow obtaining better and more homo-geneous results at each level of the hierarchy This is ofparamount importance because more specific annotationsare more informative and can get more biological insightsinto the functions of genes

93 Different Strategies to Select ldquoNegativerdquoGenes In bothGOand FunCat only positive annotations are usually availablewhile negative annotations aremuch reducedMore preciselyin theGO only about 2500 negative annotations are availableand surely this amount does not allow a sufficient coverage ofnegative examples

Moreover some seminal works in functional genomicspointed out that the strategy of choosing negative trainingexamples does affect the classifier performance [8 163 179180]

In [123] two strategies for choosing negative exampleshave been compared the basic (B) and the parent only (PO)strategy

According to the 119861 strategy the set of negative examplesare simply those genes 119892 that are not annotated for class 119888119894that is

119873119861 = 119892 119892 notin 119888119894 (33)

The PO selection strategy chooses as negatives for theclass 119888119894 only those examples that are nonannotated to 119888119894 butare annotated for a parent class More precisely for a givenclass 119888119894 corresponding to node 119894 in the taxonomy the set ofnegative examples is

119873PO = 119892 119892 notin 119888119894 119892 isin par (119894) (34)

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 20: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

20 ISRN Bioinformatics

00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(a)00

01

0101

010103 010106

01010605

010109

0102 0103

010301

01030103

010304 010316

01031601

0104 0105

010502

01050204 01050207

010503 010525

0106

010602

01060201

0107

010701

0120

02

0201 0210 0211 0213

021303

0219 0245

10

1001

100102 100103

10010305

100105

10010501 10010503

1001050301

100109

10010905

1003

100301

10030101

1003010103 1003010109 1003010111

10030103

100302 100303 100304

10030405

100305

11

1102

110201 110202 110203

11020301

1102030101

11020304

1104

110401 110402 110403

11040301

1106

12

1201

120101

1204

120401

1207 1210

14

1401 1404 1407

140702 140703 140704 140705 140711

1410 1413

141301

14130101

16

1601 1603

160301 160303

1607 1619

161903

1621

162107

18

1802

180201

18020101

20

2001

200101

20010101

200110 200115 200121

2003

200322

2009

200901 200904 200907

20090703 20090727

200913 200914 200916

20091609

2009160903

200918

20091809

2009180901

30

3001

300105

30010501 30010505

3001050501

32

3201

320101 320103 320107 320109

3207

34

3401

340101

34010101 34010103

3411

341103

34110307

40

4001

41

4101

410101

42

4201 4204

420403

4210

421003 421005

4216 4225

43

4301

430103

43010305 43010309

(b)

Figure 12 FunCat trees to compare F-scores achieved with data integration (KF) to the best single-source classifiers trained on BIOGRIDdata Black nodes depict functional classes for which KF achieves better F-scores (a) FLAT and (b) TPR-W ensembles

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 21: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 21Pr

ecisi

on

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(a)

Reca

ll

02040608

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

(b)

010203040506

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Biogrid

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

KF

HB

HB-

CSH

TDH

TD-C

STP

RTP

R-W

Wvote

F-s

core

(c)

Figure 13 Comparison of hierarchical precision recall and F-score among different hierarchical ensemble methods using the bestsource of biomolecular data (BIOGRID) kernel fusion (KF) andweighted voting (WVOTE) data integration techniques HB standsfor HBAYES

Hence this strategy selects negative examples for trainingthat are in a certain sense ldquocloserdquo to positives It is easy tosee that 119873PO sube 119873119861 hence this strategy selects for traininga large set of generic negative examples possibly annotatedwith classes that are associated with faraway nodes in thetaxonomy Of course the set of positive examples is the samefor both strategies

The 119861 strategy worsens the performance of hierarchicalmultilabel methods while for FLAT ensembles there isno clear trend Indeed in Figure 15 we compare the F-scores obtained with 119861 to those obtained with PO usingboth hierarchical cost-sensitive (Figure 15(a)) and FLAT(Figure 15(b)) methods Each point represents the F-scorefor a specific FunCat class achieved by a specific methodwith B (abscissa) and PO (ordinate) strategy for the selectionof negative examples In Figure 15(a) most points lie abovethe bisector independently of the hierarchical cost-sensitivemethod being used This shows that hierarchical methods

Prec

ision

02040608

1 2 3 4

KF SingleKF SingleKF SingleKF SingleKF Single

5

(a)

KF SingleKF SingleKF SingleKF SingleKF Single

Reca

ll

02040608

1 2 3 4 5

(b)

KF SingleKF SingleKF SingleKF SingleKF Single

02040608

1 2 3 4 5

F-s

core

(c)

Figure 14 Comparison of per level average precision recall andF-score across the five levels of the FunCat taxonomy in HBAYES-CS using single data sets (single) and kernel fusion techniques (KF)Performance of ldquosinglerdquo is computed by averaging across all thesingle data sources

gain in performance when using the PO strategy as opposedto the 119861 strategy (119875-value = 22 times 10

minus16 according to theWilcoxon signed-ranks test) This is not the case for FLATmethods (Figure 15(b))

These results can be explained by considering that the POstrategy takes into account the hierarchy to select negativeswhile the 119861 strategy does not More precisely FLATmethodshaving no information about the hierarchical structure ofclasses may fail to distinguish negative examples belongingto very distant classes thus resulting in a high false positiverate while hierarchical methods which know the taxonomycan use the information coming from other base classifiersto prevent a local base learner from incorrectly classifyingldquodistantrdquo negative examples

In conclusion these seminal works show that the strategyto choose negative examples exerts a significant impact onthe accuracy of the predictions of hierarchical ensemblemethods and more research work is needed to explore thistopic

10 Open Problems and Future Trends

In the previous section we showed that different learningissues should be considered to improve the effectivenessand the reliability of hierarchical ensemble methods Mostof these issues and others related to hierarchical ensemblemethods and to AFP represent challenging problems thathave been only partially considered by previous work Forthese reasons we try to delineate some of the open problemand research trends in the context of this research area

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 22: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

22 ISRN Bioinformatics

Basic

Pare

nt o

nly

100806040200

10

08

06

04

02

00

(a)

Basic

Pare

nt o

nly

10

08

06

04

02

00

100806040200

(b)Figure 15 Comparison of average per class F-score between basic and PO strategies (a) FLAT ensembles (b) hierarchical cost-sensitivestrategies HTD-CS (squares) TPR-W (triangles) and HBAYES-CS (filled circles) Abscissa per class F-score with base learners trainedaccording to the basic strategy ordinate per class F-score with base learners trained according to the PO strategy

For instance the selection strategies for negative exam-ples have been only partially explored even if some seminalworks show that this item exerts a significant impact on theaccuracy of the predictions [123 163 179 180] Theoreticaland experimental comparison of different strategies shouldbe performed in a systematic way to assess the impact ofthe different strategies on different hierarchical methodsconsidering also the characteristics of the learning machinesused as base learners

Some works showed also that the cost-sensitive strategiesare needed to significantly improve predictions especiallyin a hierarchical context [123] but new research could beconsidered for both applying and designing cost-sensitivebase learners and to develop novel hierarchical ensembleunbalance-aware Cost-sensitive methods have been appliedto both the single base learners and also to the overall hierar-chical ensemble strategy [31 123] and recently a hierarchicalvariant of SMOTE (synthetic minority oversampling tech-nique) [181] has been applied to hierarchical protein functionprediction showing very promising results [145] In principleclassical ldquobalancingrdquo strategies should be explored to improvethe accuracy and the reliability of the base learners andhence of the overall hierarchical classification process Forinstance randomundersampling or oversampling techniquescould be applied the former augments the annotations byexactly duplicating the annotated proteins whereas the latterrandomly takes away some unannotated examples [182]Other approaches could be considered such as heuristicresamplingmethods [183] or embedding resamplingmethodsinto data mining algorithms [184] or ensemble methodstailored to imbalanced classification problems [185ndash187]

Since functional classes are unbalanced precisionrecallanalysis plays a central role in AFP problems and often drives

ldquoin vitrordquo experiments that provide biological insights intospecific functional genomics problems [1] Only a few hierar-chical ensemblemethods such asHBAYES-CS [27] andTPR-W [31] can tune their precisionrecall characteristics througha single global parameter In HBAYES-CS by incrementingthe cost factor 120572 = 120579

minus

119894120579+

119894 we introduce progressively lower

costs for positive predictions thus resulting in an incrementof the recall (at the expenses of a possibly lower precision)In TPR-W by incrementing 119908 we can reduce the recalland enhance the precision Parametric versions of otherhierarchical ensemble methods could be developed in orderto design ensemble methods with ldquotunablerdquo precisionrecallcharacteristics

Another important issue that should be considered inthe design of novel hierarchical ensemble methods is theincompleteness of the available annotations and its impact onthe performance of computational methods for AFP Indeedthe successful application of supervised and semisupervisedmachine learning methods to these tasks requires a goldstandard for protein function that is a trusted set of correctexamples but unfortunately the annotations is incompleteand undergoes frequent updates and also the GO is fre-quently updated Some seminal works showed that on theone hand current machine learning approaches are able togeneralize and predict novel biology from an incomplete goldstandard and on the other hand incomplete functional anno-tations adversely affect the evaluation of machine learningperformance [188] A very recent work addressed these itemsby proposing methods based on weak-label learning specif-ically designed to replenish the functions of proteins underthe assumption that proteins are partially annotated Moreprecisely two new algorithms have been proposed ProWLprotein function prediction with weak-label learning which

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 23: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 23

can recover missing annotations by using the available rel-evant annotations that is a set of trusted annotations fora given protein and ProWL-IF protein function predictionwith weak-label learning and knowledge of irrelevant func-tion by which also irrelevant functions that is functions thatcannot be associatedwith the protein of interest are exploitedto replenish themissing functions [189 190]The results showthat these items should be considered in futureworks for hier-archical multilabel predictions of protein functions in modelorganisms

Another issue is represented by the reliability of theannotations Usually only experimental evidence is used toannotate the proteins for training AFP methods but mostof the available annotations are computationally predictedannotations without any experimental validation [191] Toat least partially exploit this huge amount of informationcomputationalmethods able to take into account the differentreliability of the available annotations should be developedand integrated into hierarchical ensemble algorithms

A quite neglected item is the interpretability of thehierarchical models Nevertheless the generation of compre-hensible classificationmodels is of paramount importance forbiologists in order to provide new insights into the correlationof protein features and their functions [192] A first step in thisdirection is represented by thework ofCerri et al that exploitsthe advantages of grammar-based evolutionary algorithms toincorporate prior knowledge with the simplicity of geneticalgorithms for optimization problems in order to produceinterpretable rules for hierarchical multilabel classification[193]

Other issues depend on ldquostrengthrdquo or the general rulethat relates the predictions made by the base learner at agiven termnode of the hierarchy with the predictions madeby the other base learners of the hierarchical ensembleFor instance the TPR algorithm (Section 7) weights thepositive predictions of deeper nodes with an exponentialdecrement with respect to their depth (Section 11) but otherrules (eg linear or polynomial) could be considered asthe basis for the development of new algorithms that putmore weight on the decisions of deep nodes of the hierarchyOther enhancements could be introduced with the TPR-Walgorithm (Section 72) indeed we can note that positivechildren of a node at level 119894 of the hierarchy have the sameweight independently of the size of their hanging subtreeIn some cases this could be useful but in other cases itcould be desirable to directly take into account the fact thata positive prediction is maintained along a path of the treeindeed this witnesses for a positive annotation of the node atlevel 119894

More in general in the spirit of a work recently proposed[123] the analysis of the synergy between the issues intro-duced above could be of great interest to better understandthe behaviour of hierarchical ensemble methods

Finally we introduce some problems that could open newand interesting research lines in the context of hierarchicalensemble methods

At first an important issue could be represented bythe design and development of multitask learning strategies[194] able to exploit the relationships between functional

classes just during the learning phase in order to establish afunctional connection between learning processes associatedwith hierarchically related classes of the functional taxonomyIn this way just during the training of the base learnersthe learning processes will be dependent on each other (atleast for nearby nodesclasses) enabling ldquomutual learningrdquo ofrelated classes in the taxonomy

A second to my knowledge not explored learning issueis represented by the metaintegration of hierarchical pre-dictions Considering that there is no ldquokillerrdquo hierarchicalensemble method a metacombination of the hierarchicalpredictions could be explored to enhance the overall perfor-mances

A last issue is represented by multispecies predictions ina hierarchical context By exploiting homology relationshipsbetween proteins of different species we could enhance theprediction for a particular species by using predictions or dataavailable for other species This is a common practice withfor example sequence-based methods but novel researchis needed to extend this homology-based approach in thecontext of hierarchical ensemble methods for multispeciesprediction It is worth noting that this multispecies approachyields to big-data analysis with the associated problems ofscalability of existing algorithms A possible solution to thislast problem could be represented by distributed parallelcomputation [195] or by the adoption of secondary memory-based computational techniques [196]

11 Conclusions

Hierarchical ensemble methods represent one of the mainresearch lines for AFP Their two-step learning strategyintroduces a high modularity in the prediction system in thefirst step different base learners can be trained to individuallylearn the functional classes and in the second step differentalgorithms can be chosen to hierarchically combine thepredictions provided by the base classifiers The best resultscan be obtained when the global topology of the ontology isexploited and when both top-down and bottom-up learningstrategies are applied [24 27 30 31]

Nevertheless a hierarchical learning strategy alone is notenough to achieve state-of-the-art results for AFP Indeed weneed to design hierarchical ensemble methods in the contextof the learning issues strictly related to the AFP problem

The first one is represented by data fusion since eachsource of biomolecular data may provide different and oftencomplementary information about a protein and an integra-tion of data fusion methods with hierarchical ensembles ismandatory to improve AFP results

The second one is represented by the cost-sensitivetechniques needed to take into account the usually smallnumber of positive annotations data unbalance-awaremeth-ods should be embedded in hierarchical methods to avoidsolutions biased toward low sensitivity predictions

Other issues ranging from the proper choice of negativeexamples to the reliability and the incompleteness of theavailable annotation the balance between local and globallearning strategies and the metaintegration of hierarchicalpredictions have been only partially addressed in previous

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 24: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

24 ISRN Bioinformatics

Figure 16 GO BP DAG for the yeast model organism (realized through the HCGene software [178]) involving more than 1000 terms andmore than 2000 edges

work More in general the synergy between hierarchi-cal ensemble methods data integration algorithms cost-sensitive techniques and other related issues is the key toimprove AFP methods and to drive experiments aimed atdiscovering previously unannotated or partially annotatedprotein functions [123]

Indeed despite their successful application to proteinfunction prediction in different model organisms as outlinedin Section 9 there is large room for future research in thischallenging area of computational biology

In particular the development of multitask learningmethods to jointly learn related GO terms in a hierarchicalcontext and the design of multispecies hierarchical algo-rithms able to scale with millions of proteins representa compelling challenge for the computational biology andbioinformatics community

Appendices

A Gene Ontology and FunCat

The two main taxonomies of gene functional classes are rep-resented by the Gene Ontology (GO) [6] and the functional

catalogue (FunCat) [5] In the former the functional classesare structured according to a directed acyclic graph that isa DAG (Figure 16) while in the latter the functional classesare structured through a forest of trees (Figure 17) The GOis composed of thousands of functional classes and it isset out in three separated ontologies ldquobiological processesrdquoldquomolecular functionrdquo and ldquocellular componentrdquo Indeed agene can participate to specific biological processes (eg cellcycle metabolism and nucleotide biosynthesis) and at thesame time can perform specific molecular functions (egcatalytic or binding activities that occur at the molecularlevel) in specific cellular components (eg mitochondrion orrough endoplasmic reticulum)

A1 GO The Gene Ontology The Gene Ontology (GO)project began as collaboration between threemodel organismdatabases FlyBase (Drosophila) the Saccharomyces genomedatabase (SGD) and the mouse genome database (MGD)in 1998 Now it includes several of the worldrsquos major repos-itories for plant animal and microbial genomes The GOproject has developed three structured controlled vocabu-laries (ontologies) that describe gene products in terms oftheir associated biological processes cellular components

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 25: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 25

Figure 17 FunCat tree for the yeast model organism (realized through the HCGene software) [178] A ldquodummyrdquo root node has been addedto obtain a single tree from the tree forest

and molecular functions in a species-independent manner(Figure 16) Biological process (BP) represents series of eventsaccomplished by one or more ordered assemblies of molec-ular functions that exploit a specific biological function forinstance lipid metabolic process or tricarboxylic acid cycleMolecular function (MF) describes activities that occur atthe molecular level such as catalytic or binding activities Anexample of MF is glycine dehydrogenase activity or glucosetransporter activity Cellular component (CC) represents justparts or components of a cell such as organelles or physicalplaces or compartments in which a specific gene productis located An example is the endoplasmic reticulum or theribosome

The ontologies of the GO are structured as a directedacyclic graph (DAG) 119866 = ⟨119881 119864⟩ where 119881 = 119905 | termsof the GO and 119864 = (119905 119906) | 119905 119906 isin 119881 Relations between GOterms are also categorized in the following threemain groups

(i) is-a (subtype relations) if we say the termnode 119860 isa 119861 we mean that node 119860 is a subtype of node 119861For example mitotic cell cycle is a cell cycle or lyaseactivity is a catalytic activity

(ii) part-of (part-whole relations) 119860 is part of 119861 whichmeans that whenever 119860 exists it is part of 119861 and thepresence of 119860 implies the presence of 119861 For instancemitochondrion is part of cytoplasm

(iii) regulates (control relations) if we say that119860 regulates119861wemean that119860 directly affects the manifestation of119861 that is the former regulates the latter

While is-a and part-of are transitive ldquoregulatesrdquo is notMoreover in some cases regulatory proteins are not expectedto have the same properties as the proteins they regulateand hence predicting regulation may require other data andassumptions than predicting function similarity For thesereasons usually in AFP regulates relations (that howeverare a minority of the existing relations) are usually notused

Each annotation is labeled with an evidence code thatindicates how the annotation to a particular term is sup-ported They are subdivided in several categories rangingfrom experimental evidence codes used when experimentalassays have been applied for the annotation for exampleinferred from physical interaction (IPI) or inferred frommutant phenotype (IMP) or inferred from genetic interaction(IGI) to author statement codes such as traceable authorstatement (TAS) that indicate that the annotation was madeon the basis of a statement made by the author(s) in the citedreference to computational analysis evidence codes basedon an in silico analyses manually reviewed (eg inferredfrom sequence or structural similarity (ISS)) For the fullset of available evidence codes please see the GO website(httpwwwgeneontologyorg)

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 26: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

26 ISRN Bioinformatics

A GO graph for the yeast model organism is representedin Figure 16 It is worth noting that despite the complexityof the represented graph Figure 16 does not show all theavailable terms and the relationships involved in the GO BPontology with the yeast model organism

A2 FunCat The Functional Catalogue The FunCat taxon-omy started with Saccharomyces cerevisiae genome project atMIPS (httpmipsgsfde) at the beginning FunCat con-tained only those categories required to describe yeast biol-ogy [197 198] but successively its content has been extendedto plants to annotate genes from the Arabidopsis thalianagenome project and furthermore to cover prokaryotic organ-isms and finally animals too [5]

The FunCat represents a relatively simple and concise setof gene functional classes it consists of 28 main functionalcategories (or branches) that cover general fields like cellulartransport metabolism and cellular communicationsignaltransductionThesemain functional classes are divided into aset of subclasses with up to six levels of increasing specificityaccording to a tree-like structure that accounts for differentfunctional characteristics of genes and gene products Genesmay belong at the same time to multiple functional classessince several classes are subclasses of more general onesand because a gene may participate in different biologicalprocesses and may perform different biological functions

Taking into account the broad and highly diverse spec-trum of known protein functions the FunCat annota-tion scheme covers general features like cellular transportmetabolism and protein activity regulation Each of its main28 functional branches is organized as a hierarchical tree-like structure thus leading to a tree forest with hundreds offunctional categories

Differently from the GO the FunCat is more compactand does not intend to classify protein functions down tothe most specific level From a general standpoint it can becompared to parts of the molecular function and biologicalprocess terms of the GO system

One of the main advantages of FunCat is its intuitivecategory structure For instance the annotation of yeastuses only 18 of the main categories and less than 300 dis-tinct categories (Figure 17) while the Saccharomyces genomedatabase (SGD) [199] uses more than 1500 GO terms in itsyeast annotation Indeed FunCat focuses on the functionalprocess and in part on themolecular function while GO aimsat representing a fine granular description of proteins thatprovides annotations with a wealth of detailed informationHowever to achieve this goal the detailed description offeredby GO leads to a large number of terms (eg the ontologyfor biological processes alone contains more than 10000terms) and such a huge amount of terms is very difficultto be handled for annotators Moreover we may have avery large number of possible assignments that may lead toerroneous or inconsistent annotations FunCat is simplerits tree structure compared with the DAG structure of theGO leads to both simple procedures for annotation and lessdifficult computational-based classification tasks In otherwords it represents a well-balanced compromise between

extensive depth breadth and resolution but without beingtoo granular and specific

It is worth noting that both FunCat and GO ontologiesundergo modifications between different releases and at thesame time the annotations are also subjected to changes sincethey represent the results of the knowledge of the scientificcommunity at a given time As a consequence predictionsresulting for example in false positives for a given releaseof the GO may become true positive in future releasesand more in general we should keep in mind that theavailable annotations are always partial and incomplete anddepend on the knowledge available for the species understudy Nevertheless even if some works pointed out theinconsistency of current GO taxonomies through the analysisof violations of terms univocality [200] GO and FunCat areconsidered the ground truth to evaluate AFP methods sincethey represent the main effort of the scientific community toorganize commonly accepted taxonomy of protein functions[191]

B AFP Performance Assessment ina Hierarchical Context

In the context of ontology-wide protein function predictionproblems where negative examples are usually a lot morethan positive ones accuracy is not a reliablemeasure to assessthe classification performance For this reason the classicalF-score is used instead to take into account the unbalanceof functional classes If TP represents the positive examplescorrectly predicted as positive FN the positive examplesincorrectly predicted as negative and FP the negativesincorrectly predicted as positives then the precision 119875 andthe recall 119877 are

119875 =TP

TP + FP119877 =

TPTP + FN

(B1)

The F-score 119865 is the harmonic mean between precision andrecall

119865 =2 sdot 119875 sdot 119877

119875 + 119877 (B2)

If we need to evaluate the correct ranking of annotatedproteins with respect to a specific functional class a valuablemeasure is represented by the area under the receivingoperating characteristic curve (AUC) A random rankingcorresponds toAUC ≃ 05 while values close to 1 correspondto near optimal ranking that is AUC = 1 if all the annotatedgenes are ranked before the unannotated ones

In order to better capture the hierarchical and sparsenature of the protein function prediction problem we alsoneed specific measures that estimate how far a predictedstructured annotation is from the correct one Indeed func-tional classes are structured according to a direct acyclicgraph (Gene Ontology) or to a tree (FunCat) and we needmeasures to accommodate not just ldquoexact matchesrdquo but alsoldquonear missesrdquo of different sorts

For instance correctly predicting a parent or ancestorannotation while failing to predict themost specific available

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 27: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 27

annotation should be ldquopartially correctrdquo in the sense thatwe can gain information about the more general functionalcharacteristics of a gene missing only its most specificfunctions

More precisely given a general taxonomy 119866 representingthe graph of the functional classes for a given genegeneproduct 119909 consider the graph 119875(119909) sub 119866 of the predictedclasses and the graph 119862(119909) of the correct classes associatedwith 119909 and let 119897(119875) be the set of the leaves (nodes withoutchildren) of the graph 119875 Given a leaf 119901 isin 119875(119909) let uarr 119901 bethe set of ancestors of the node 119901 that belong to 119875(119909) andgiven a leaf 119888 isin 119862(119909) let uarr 119888 be the set of ancestors of thenode 119888 that belong to 119862(119909) the hierarchical precision (HP)hierarchical recall (HR) and hierarchical 119865-score (HF) aredefined as follows [201]

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

max119888isin119897(119862(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

max119901isin119897(119875(119909))

1003816100381610038161003816uarr 119888cap uarr 1199011003816100381610038161003816

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B3)

In the case of the FunCat taxonomy since it is structured as atree we can simplify HP HR and HF as follows

HP =1

|119897 (119875 (119909))|sum

119901isin119897(119875(119909))

1003816100381610038161003816119862 (119909) cap uarr 1199011003816100381610038161003816

1003816100381610038161003816uarr 1199011003816100381610038161003816

HR =1

|119897 (119862 (119909))|sum

119888isin119897(119862(119909))

|uarr 119888 cap 119875 (119909)|

|uarr 119888|

HF =2 sdotHP sdotHRHP +HR

(B4)

An overall high hierarchical precision is indicating thatthe predictor is able to detect the most general functionsof genesgene products On the other hand a high averagehierarchical recall indicates that the predictors are able todetect the most specific functions of the genes The hierar-chical F-measure expresses the correctness of the structuredprediction of the functional classes taking into account alsopartially correct paths in the overall hierarchical taxonomythus providing in a synthetic way the effectiveness of thestructured hierarchical prediction

Another variant of hierarchical classification measure isrepresented by the hierarchical F-measure proposed by Kir-itchenko et al [202] Let 119875(119909) be the set of classes predictedin the overall hierarchy for a given genegene product 119909 andlet 119862(119909) be the corresponding set of ldquotruerdquo classes Then thehierarchical precision HP119870 and the hierarchical recall HR119870according to Kiritchenko are defined as

HP119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119875 (119909)|HR119870 = sum

119909

|119875 (119909) cap 119862 (119909)|

|119862 (119909)|

(B5)

Note that these definitions do not explicitly consider thepaths included in the predicted subgraphs but simply the ratiobetween the number of common classes and respectively thepredicted (HP119870) and the true classes (HR119870)The hierarchicalF-measure HF119870 is the harmonic mean between HP119870 andHR119870

HF119870 =2 sdotHP119870 sdotHR119870HP119870 +HR119870

(B6)

C Effect of the Propagation of the PositiveDecisions in TPR Ensembles

In TPR ensembles a generic node at level 119896 is any nodewhose distance from the root is equal to 119896 The posteriorprobability computed by the ensemble for a generic nodeat level 119896 is denoted by 119902119896 More precisely 119902119896 denotes theprobability computed by the ensemble and 119902119896 denotes theprobability computed by the base learner local to a node atlevel 119896 Moreover we define 119902119895

119896+1as the probability of a child

119895 of a node at level 119896 where the index 119895 ge 1 refers to differentchildren of a node at level 119896 From (30) we can derive thefollowing expression for the probability 119902119896 computed for ageneric node at level 119896 of the hierarchy [31]

119902119896 (119892) = 119908 sdot 119902119896 (119892) +1 minus 119908

1003816100381610038161003816120601119896 (119892)1003816100381610038161003816

sum

119895isin120601119896(119892)

119902119895

119896+1(119892) (C1)

To simplify the notation we can introduce the followingexpression to indicate the average of the probabilities com-puted by the positive children nodes of a generic node at level119896

119886119896+1 =1

10038161003816100381610038161206011198961003816100381610038161003816

sum

119895isin120601119896

119902119895

119896+1 (C2)

and we can introduce similar notations for 119886119896+2 (average ofthe probabilities of the grandchildren) and more in generalfor 119886119896+119895 (descendants at level 119895 of a generic node) Byextending these definitions across levels we can obtain thefollowing theorem

Theorem 2 (influence of positive descendant nodes) In aTPR-W ensemble for a generic node at level 119896 with a givenparameter 119908 0 le 119908 le 1 balancing the weight between parentand children predictors and having a variable number largerthan or equal to 1 of positive descendants for each of the119898 lowerlevels below the following equality holds for each119898 ge 1

119902119896 = 119908119902119896 +

119898minus1

sum

119895=1

119908(1 minus 119908)119895119886119896+119895 + (1 minus 119908)

119898119886119896+119898 (C3)

For the full proof see [31]Theorem 2 shows that the contribution of the descendant

nodes decays exponentially with their depth and dependscritically on the choice of the 119908 parameter To get moreinsights into the relationships between119908 and its effect on theinfluence of positive decisions on a generic node at level 119896

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 28: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

28 ISRN Bioinformatics

100806040200

10

08

06

04

02

00

m = 1

m = 2

m = 3

m = 4

m = 5

m = 10

w

f(w

)

(a)

100806040200

w = 01 w = 05 w = 08

j = 1

j = 2

j = 3

j = 4

j = 5

j = 10

w

g(w

)

030

025

020

015

010

005

000

(b)

Figure 18 (a) Plot of 119891(119908) = (1 minus 119908)119898 while varying119898 from 1 to 10 (b) Plot of 119892(119908) = 119908(1 minus 119908)

119895 while varying 119895 from 1 to 10 The integers119895 refer to internal nodes at distance 119895 from the reference node at level 119896

(Theorem 1) Figure 18(a) shows the function that governs thedecay of the influence of leaf nodes at different depths119898 andFigure 18(b) shows the function responsible of the influenceof nodes above the leaves

Conflict of Interests

The author has no direct financial relations that might lead toa conflict of interests associated with this paper

Acknowledgments

The author thanks the reviewers for their comments andsuggestions and acknowledges partial support from the PRINproject ldquoAutomi e Linguaggi Formali aspetti matematici eapplicativerdquo funded by the Italian Ministry of University

References

[1] I Friedberg ldquoAutomated protein function predictionmdashthegenomic challengerdquo Briefings in Bioinformatics vol 7 no 3 pp225ndash242 2006

[2] L Pena-Castillo M Tasan C L Myers et al ldquoA critical assess-ment of Mus musculus gene function prediction using inte-grated genomic evidencerdquo Genome Biology vol 9 supplement1 article S2 2008

[3] G Valentini ldquoMosclust a software library for discoveringsignificant structures in bio-molecular datardquo Bioinformaticsvol 23 no 3 pp 387ndash389 2007

[4] A Bertoni andG Valentini ldquoDiscoveringmulti-level structuresin bio-molecular data through the Bernstein inequalityrdquo BMCBioinformatics vol 9 no 2 article S4 2008

[5] A Ruepp A Zollner D Maier et al ldquoThe FunCat a functionalannotation scheme for systematic classification of proteins from

whole genomesrdquo Nucleic Acids Research vol 32 no 18 pp5539ndash5545 2004

[6] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[7] A Valencia ldquoAutomatic annotation of protein functionrdquo Cur-rent Opinion in Structural Biology vol 15 no 3 pp 267ndash2742005

[8] N Youngs D Penfold-Brown K Drew D Shasha and R Bon-neau ldquoParametric bayesian priors and better choice of negativeexamples improve protein function predictionrdquo Bioinformaticsvol 29 no 9 pp 1190ndash1198 2013

[9] A Clare and R D King ldquoPredicting gene function in Saccha-romyces cerevisiaerdquo Bioinformatics vol 19 no 2 pp II42ndashII492003

[10] Y Bilu and M Linial ldquoThe advantage of functional predictionbased on clustering of yeast genes and its correlation withnon-sequence based classificationsrdquo Journal of ComputationalBiology vol 9 no 2 pp 193ndash210 2002

[11] G R G Lanckriet T De Bie N Cristianini M I Jordan andS Noble ldquoA statistical framework for genomic data fusionrdquoBioinformatics vol 20 no 16 pp 2626ndash2635 2004

[12] W Tian L V Zhang M Tasan et al ldquoCombining guilt-by-association and guilt-by-profiling to predict Saccharomycescerevisiae gene functionrdquo Genome Biology vol 9 no 1 articleS7 2008

[13] W K Kim C Krumpelman and E M Marcotte ldquoInfer-ring mouse gene functions from genomic-scale data using acombined functional networkclassification strategyrdquo GenomeBiology vol 9 no 1 article S5 2008

[14] S Mostafavi D Ray D Warde-Farley C Grouios and Q Mor-ris ldquoGeneMANIA a real-time multiple association networkintegration algorithm for predicting gene functionrdquo GenomeBiology vol 9 no 1 article S4 2008

[15] I Lee B Ambaru P Thakkar E M Marcotte and S Y RheeldquoRational association of genes with traits using a genome-scale

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 29: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 29

gene network for Arabidopsis thalianardquo Nature Biotechnologyvol 28 no 2 pp 149ndash156 2010

[16] P Radivojac W T Clark T R Oron et al ldquoA large-scale eval-uation of computational protein function predictionrdquo NatureMethods vol 10 no 3 pp 221ndash227 2013

[17] A S Juncker L J Jensen A Pierleoni et al ldquoSequence-basedfeature prediction and annotation of proteinsrdquoGenome Biologyvol 10 no 2 p 206 2009

[18] R Sharan I Ulitsky and R Shamir ldquoNetwork-based predictionof protein functionrdquo Molecular Systems Biology vol 3 p 882007

[19] A Sokolov and A Ben-Hur ldquoHierarchical classification ofgene ontology terms using the GOstruct methodrdquo Journal ofBioinformatics and Computational Biology vol 8 no 2 pp 357ndash376 2010

[20] Z Barutcuoglu R E Schapire and O G Troyanskaya ldquoHierar-chical multi-label prediction of gene functionrdquo Bioinformaticsvol 22 no 7 pp 830ndash836 2006

[21] H N Chua W-K Sung and L Wong ldquoAn efficient strategyfor extensive integration of diverse biological data for proteinfunction predictionrdquo Bioinformatics vol 23 no 24 pp 3364ndash3373 2007

[22] P Pavlidis J Weston J Cai and W S Noble ldquoLearning genefunctional classifications from multiple data typesrdquo Journal ofComputational Biology vol 9 no 2 pp 401ndash411 2002

[23] M Re and G Valentini ldquoSimple ensemble methods are com-petitive with stateof- the-art data integration methods for genefunction predictionrdquo Journal of Machine Learning ResearchWampC Proceedings Machine Learning in Systems Biology vol 8pp 98ndash111 2010

[24] G Obozinski G Lanckriet C Grant M I Jordan and W SNoble ldquoConsistent probabilistic outputs for protein functionpredictionrdquo Genome Biology vol 9 no 1 article S6 2008

[25] D Cozzetto D Buchan K Bryson and D Jones ldquoProteinfunction prediction by massive integration of evolutionaryanalyses and multiple data sourcesrdquo BMC Bioinformatics vol14 supplement 3 article S1 2013

[26] X Jiang N Nariai M Steffen S Kasif and E D KolaczykldquoIntegration of relational and hierarchical network informationfor protein function predictionrdquo BMC Bioinformatics vol 9article 350 2008

[27] N Cesa-Bianchi and G Valentini ldquoHierarchical cost-sensitivealgorithms for genome-wide gene function predictionrdquo Journalof Machine Learning Research WampC Proceedings MachineLearning in Systems Biology vol 8 pp 14ndash29 2010

[28] R Cerri andAC P L FDeCarvalho ldquoNew top-downmethodsusing SVMs for hierarchical multilabel classification problemsrdquoin Proceedings of the International Joint Conference on NeuralNetworks (IJCNN rsquo10) pp 1ndash8 IEEE Computer Society July2010

[29] L Schietgat C Vens J Struyf H Blockeel D Kocev and SDzeroski ldquoPredicting gene function using hierarchical multi-label decision tree ensemblesrdquo BMC Bioinformatics vol 11article 2 2010

[30] Y Guan C L Myers D C Hess Z Barutcuoglu A ACaudy and O G Troyanskaya ldquoPredicting gene function in ahierarchical context with an ensemble of classifiersrdquo GenomeBiology vol 9 no 1 article S3 2008

[31] G Valentini ldquoTrue path rule hierarchical ensembles forgenome-wide gene function predictionrdquo IEEEACM Transac-tions on Computational Biology and Bioinformatics vol 8 no3 pp 832ndash847 2011

[32] NAlaydie CK Reddy andF Fotouhi ldquoExploiting label depen-dency for hierarchical multi-label classificationrdquo in Advances inKnowledgeDiscovery andDataMining vol 7301 ofLectureNotesin Computer Science pp 294ndash305 Springer 2012

[33] D Koller andM Sahami ldquoHierarchically classifying documentsusing very few wordsrdquo in Proceedings of the 14th InternationalConference on Machine Learning pp 170ndash178 1997

[34] S Chakrabarti B Dom R Agrawal and P Raghavan ldquoScalablefeature selection classification and signature generation fororganizing large text databases into hierarchical topic tax-onomiesrdquo VLDB Journal vol 7 no 3 pp 163ndash178 1998

[35] M-L Zhang and Z-H Zhou ldquoMultilabel neural networks withapplications to functional genomics and text categorizationrdquoIEEE Transactions on Knowledge and Data Engineering vol 18no 10 pp 1338ndash1351 2006

[36] A Burred and J Lerch ldquoA hierarchical approach to automaticmusical genre classificationrdquo in Proceedings of the 6th Interna-tional Conference on Digital Audio Effects pp 8ndash11 2003

[37] C DeCoro Z Barutcuoglu and R Fiebrink ldquoBayesian aggre-gation for hierarchical genre classificationrdquo in Proceedings of the8th International Conference onMusic Information Retrieval pp77ndash80 2007

[38] K Trohidis G Tsoumahas G Kalliris and I P Vlahavas ldquoMul-tilabel classification of music into emotionsrdquo in Proceedings ofthe 9th International Conference onMusic Information Retrievalpp 325ndash330 2008

[39] A Binder M Kawanabe and U Brefeld ldquoBayesian aggregationfor hierarchical genre classificationrdquo in Proceedings of the 9thAsian Conference on Computer Vision 2009

[40] Z Barutcuoglu and C DeCoro ldquoHierarchical shape classifica-tion using Bayesian aggregationrdquo in Proceedings of the IEEEInternational Conference on Shape Modeling and Applications(SMI rsquo06) p 44 June 2006

[41] A Dimou G Tsoumakas V Mezaris I Kompatsiaris and IVlahavas ldquoAn empirical study of multi-label learning methodsfor video annotationrdquo in Proceedings of the 7th InternationalWorkshop on Content-Based Multimedia Indexing (CBMI rsquo09)pp 19ndash24 Chania Greece June 2009

[42] K Punera and J Ghosh ldquoEnhanced hierarchical classificationvia isotonic smoothingrdquo in Proceedings of the 17th InternationalConference onWorldWideWeb (WWW rsquo08) pp 151ndash160 ACMBeijing China April 2008

[43] M Ceci and D Malerba ldquoClassifying web documents in ahierarchy of categories a comprehensive studyrdquo Journal ofIntelligent Information Systems vol 28 no 1 pp 37ndash78 2007

[44] C N Silla Jr and A A Freitas ldquoA survey of hierarchical classi-fication across different application domainsrdquoData Mining andKnowledge Discovery vol 22 no 1-2 pp 31ndash72 2011

[45] O G Troyanskaya K Dolinski A B Owen R B Alt-man and D Botstein ldquoA Bayesian framework for combiningheterogeneous data sources for gene function prediction (inSaccharomyces cerevisiae)rdquo Proceedings of the National Academyof Sciences of the United States of America vol 100 no 14 pp8348ndash8353 2003

[46] K Tsuda H Shin and B Scholkopf ldquoFast protein classificationwith multiple networksrdquo Bioinformatics vol 21 supplement 2pp ii59ndashii65 2005

[47] J Xiong S Rayner K Luo Y Li and S Chen ldquoGenomewide prediction of protein function via a generic knowledgediscovery approach based on evidence integrationrdquo BMCBioin-formatics vol 7 article 268 2006

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 30: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

30 ISRN Bioinformatics

[48] U Karaoz T M Murali S Letovsky et al ldquoWhole-genomeannotation by using evidence integration in functional-linkagenetworksrdquoProceedings of theNational Academy of Sciences of theUnited States of America vol 101 no 9 pp 2888ndash2893 2004

[49] A Sokolov and A Ben-Hur ldquoA structured-outputs methodfor prediction of protein functionrdquo in Proceedings of the 2ndInternationalWorkshop onMachine Learning in Systems Biology(MLSB rsquo08) 2008

[50] K Astikainen L Holm E Pitkanen S Szedmak and J RousuldquoTowards structured output prediction of enzyme functionrdquoBMC Proceedings vol 2 supplement 4 article S2 2008

[51] B Done P Khatri A Done and S Draghici ldquoPredicting novelhuman gene ontology annotations using semantic analysisrdquoIEEEACM Transactions on Computational Biology and Bioin-formatics vol 7 no 1 pp 91ndash99 2010

[52] G Valentini and F Masulli ldquoEnsembles of learning machinesrdquoinNeural NetsWIRN-02 vol 2486 of Lecture Notes in ComputerScience pp 3ndash19 Springer 2002

[53] B Shahbaba and R M Neal ldquoGene function classificationusing Bayesian models with hierarchy-based priorsrdquo BMCBioinformatics vol 7 article 448 2006

[54] C Vens J Struyf L Schietgat S Dzeroski and H Block-eel ldquoDecision trees for hierarchical multi-label classificationrdquoMachine Learning vol 73 no 2 pp 185ndash214 2008

[55] G Valentini ldquoTrue path rule hierarchical ensemblesrdquo in Mul-tiple Classifier Systems Eighth InternationalWorkshop MCS2009 Reykjavik Iceland J Kittler J Benediktsson and F RoliEds vol 5519 of Lecture Notes in Computer Science pp 232ndash241Springer 2009

[56] S F AltschulW GishWMiller EWMyers and D J LipmanldquoBasic local alignment search toolrdquo Journal ofMolecular Biologyvol 215 no 3 pp 403ndash410 1990

[57] S F Altschul T L Madden A A Schaffer et al ldquoGappedblast and psi-blast a new generation of protein database searchprogramsrdquoNucleic Acids Research vol 25 no 17 pp 3389ndash34021997

[58] A Conesa S Gotz J M Garcıa-Gomez J Terol M Talonand M Robles ldquoBlast2GO a universal tool for annotationvisualization and analysis in functional genomics researchrdquoBioinformatics vol 21 no 18 pp 3674ndash3676 2005

[59] Y Loewenstein D Raimondo O C Redfern et al ldquoProteinfunction annotation by homology-based inferencerdquo Genomebiology vol 10 no 2 p 207 2009

[60] A Prlic T A Down E Kulesha R D Finn A Kahari and T JP Hubbard ldquoIntegrating sequence and structural biology withDASrdquo BMC Bioinformatics vol 8 article 333 2007

[61] M Falda S Toppo A Pescarolo et al ldquoArgot2 a large scalefunction prediction tool relying on semantic similarity ofweighted Gene Ontology termsrdquo BMC Bioinformatics vol 13supplement 4 article S14 2012

[62] T Hamp R Kassner S Seemayer et al ldquoHomology-basedinference sets the bar high for protein function predictionrdquoBMC Bioinformatics vol 14 supplement 3 article S7 2013

[63] E M Marcotte M Pellegrini M J Thompson T O Yeatesand D Eisenberg ldquoA combined algorithm for genome-wideprediction of protein functionrdquo Nature vol 402 no 6757 pp83ndash86 1999

[64] A Vazquez A Flammini A Maritan and A VespignanildquoGlobal protein function prediction from protein-protein inter-action networksrdquo Nature Biotechnology vol 21 no 6 pp 697ndash700 2003

[65] J Z Wang R Du R S Payattakil P S Yu and C F ChenldquoImproving GO semantic similarity measures by exploringthe ontology beneath the terms and modelling uncertaintyrdquoBioinformatics vol 23 no 10 pp 1274ndash1281 2007

[66] H Yang TNepusz andA Paccanaro ldquoImprovingGO semanticsimilarity measures by exploring the ontology beneath theterms and modelling uncertaintyrdquo Bioinformatics vol 28 no10 pp 1383ndash1389 2012

[67] H Yu L Gao K Tu and Z Guo ldquoBroadly predicting specificgene functions with expression similarity and taxonomy simi-larityrdquo Gene vol 352 no 1-2 pp 75ndash81 2005

[68] Y Tao L Sam J Li C Friedman andYA Lussier ldquoInformationtheory applied to the sparse gene ontology annotation networkto predict novel gene functionrdquo Bioinformatics vol 23 no 13pp i529ndashi538 2007

[69] G Pandey C L Myers and V Kumar ldquoIncorporating func-tional inter-relationships into protein function prediction algo-rithmsrdquo BMC Bioinformatics vol 10 article 142 2009

[70] X-F Zhang and D-Q Dai ldquoA framework for incorporatingfunctional interrelationships into protein function predictionalgorithmsrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 9 no 3 pp 740ndash753 2012

[71] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp 302ndash310 2005

[72] A Bertoni M Frasca and G Valentini ldquoCOSNet a cost sensi-tive neural network for semi-supervised learning in graphsrdquo inEuropean Conference on Machine Learning ECML PKDD 2011vol 6911 of Lecture Notes on Artificial Intelligence pp 219ndash234Springer 2011

[73] M Frasca A Bertoni M Re and G Valentini ldquoA neuralnetwork algorithm for semi-supervised node label learningfrom unbalanced datardquo Neural Networks vol 43 pp 84ndash982013

[74] M Deng T Chen and F Sun ldquoAn integrated probabilisticmodel for functional prediction of proteinsrdquo Journal of Com-putational Biology vol 11 no 2-3 pp 463ndash475 2004

[75] Y A I Kourmpetis A D J VanDijk M C AM Bink R C HJ Van Ham and C J F Ter Braak ldquoBayesian markov randomfield analysis for protein function prediction based on networkdatardquo PLoS ONE vol 5 no 2 Article ID e9293 2010

[76] S Oliver ldquoGuilt-by-association goes globalrdquo Nature vol 403no 6770 pp 601ndash603 2000

[77] J McDermott R Bumgarner and R Samudrala ldquoFunctionalannotation frompredicted protein interaction networksrdquoBioin-formatics vol 21 no 15 pp 3217ndash3226 2005

[78] M Re M Mesiti and G Valentini ldquoA fast ranking algorithmfor predicting gene functions in biomolecular networksrdquo IEEEACM Transactions on Computational Biology and Bioinformat-ics vol 9 no 6 pp 1812ndash1818 2012

[79] M Re and G Valentini ldquoCancer module genes ranking usingkernelized score functionsrdquo BMC Bioinformatics vol 13 sup-plement 14 article S3 2012

[80] M Re and G Valentini ldquoLarge scale ranking and repositioningof drugs with respect to DrugBank therapeutic categoriesrdquo inInternational Symposium on Bioinformatics Research and Appli-cations (ISBRA 2012) vol 7292 of Lecture Notes in ComputerScience pp 225ndash236 Springer 2012

[81] M Re and G Valentini ldquoNetwork-based drug ranking andrepositioning with respect to drugbank therapeutic categoriesrdquo

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 31: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 31

IEEEACM Transactions on Computational Biology and Bioin-formatics vol 10 no 6 pp 1359ndash1371 2014

[82] Y Bengio ODelalleau andN Le Roux ldquoLabel propagation andquadratic criterionrdquo in Semi-Supervised Learning O ChapelleB Scholkopf and A Zien Eds pp 193ndash216 MIT Press 2006

[83] Y Saad Iterative Methods For Sparse Linear Systems PWSPublishing Company Boston Mass USA 1996

[84] N Cesa-Bianchi C Gentile F Vitale and G Zappella ldquoRan-dom spanning trees and the prediction of weighted graphsrdquoin Proceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 175ndash182 Haifa Israel June 2010

[85] I Tsochantaridis T Joachims THofmann andYAltun ldquoLargemargin methods for structured and interdependent outputvariablesrdquo Journal of Machine Learning Research vol 6 pp1453ndash1484 2005

[86] J Rousu C Saunders S Szedmak and J Shawe-Taylor ldquoKernel-based learning of hierarchical multilabel classification modelsrdquoJournal of Machine Learning Research vol 7 pp 1601ndash16262006

[87] C H Lampert and M B Blaschko ldquoStructured prediction byjoint kernel support estimationrdquoMachine Learning vol 77 no2-3 pp 249ndash269 2009

[88] G Bakir T Hoffman B Scholkopf B Smola A J Taskar andS V N Vishwanathan Predicting Structured Data MIT PressCambridge Mass USA 2007

[89] R Eisner B Poulin D Szafron P Lu and R Greiner ldquoImprov-ing protein function prediction using the hierarchical structureof the gene ontologyrdquo in Proceedings of the IEEE Symposium onComputational Intelligence in Bioinformatics andComputationalBiology (CIBCB rsquo05) November 2005

[90] H Blockeel L Schietgat and A Clare ldquoHierarchical multilabelclassification trees for gene function predictionrdquo in ProbabilisticModeling and Machine Learning in Structural and SystemsBiology J Rousu S Kaski and E Ukkonen Eds HelsinkiUniversity Printing House Tuusula Finland 2006

[91] O Dekel J Keshet and Y Singer ldquoLarge margin hierarchicalclassificationrdquo in Proceedings of the 21th International Confer-ence onMachine Learning (ICML rsquo04) pp 209ndash216 OmnipressJuly 2004

[92] N Cesa-Bianchi C Gentile and L Zaniboni ldquoHierarchicalclassifications combining bayes with SVMrdquo in Proceedings of the23rd International Conference onMachine Learning (ICML rsquo06)pp 177ndash184 ACM Press June 2006

[93] T G Dietterich ldquoEnsemble methods in machine learningrdquo inMultiple Classifier Systems First International Workshop MCS2000 Cagliari Italy J Kittler and F Roli Eds vol 1857 ofLecture Notes in Computer Science pp 1ndash15 Springer 2000

[94] O Okun G Valentini and M Re Ensembles in MachineLearning Applications vol 373 of Studies in ComputationalIntelligence Springer Berlin Germany 2011

[95] M Re and G Valentini ldquoEnsemble methods a reviewrdquo inAdvances in Machine Learning and Data Mining for AstronomyDataMining andKnowledgeDiscovery pp 563ndash594 Chapmanamp Hall 2012

[96] E Bauer and R Kohavi ldquoEmpirical comparison of voting clas-sification algorithms bagging boosting and variantsrdquoMachineLearning vol 36 no 1 pp 105ndash139 1999

[97] T G Dietterich ldquoExperimental comparison of three methodsfor constructing ensembles of decision trees bagging boostingand randomizationrdquo Machine Learning vol 40 no 2 pp 139ndash157 2000

[98] R E Banfield L O Hall O Lawrence KW Bowyer W KevinandW P Kegelmeyer ldquoA comparison of decision tree ensemblecreation techniquesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 173ndash180 2007

[99] S Y Sohn andHW Shin ldquoExperimental study for the compari-son of classifier combinationmethodsrdquoPattern Recognition vol40 no 1 pp 33ndash40 2007

[100] M Dettling and P Buhlmann ldquoBoosting for tumor classifica-tion with gene expression datardquo Bioinformatics vol 19 no 9pp 1061ndash1069 2003

[101] V Robles P Larranaga J M Pena et al ldquoBayesian networkmulti-classifiers for protein secondary structure predictionrdquoArtificial Intelligence inMedicine vol 31 no 2 pp 117ndash136 2004

[102] G Valentini M Muselli and F Ruffino ldquoCancer recognitionwith bagged ensembles of support vectormachinesrdquoNeurocom-puting vol 56 no 1ndash4 pp 461ndash466 2004

[103] T Abeel T Helleputte Y Van de Peer P Dupont and YSaeys ldquoRobust biomarker identification for cancer diagnosiswith ensemble feature selection methodsrdquo Bioinformatics vol26 no 3 Article ID btp630 pp 392ndash398 2009

[104] A Rozza G Lombardi M Re E Casiraghi G Valentini andP Campadelli ldquoA novel ensemble technique for protein sub-cellular location predictionrdquo in Ensembles in Machine LearningApplications vol 373 of Studies in Computational Intelligencepp 151ndash177 Springer Berlin Germany 2011

[105] A Topchy A K Jain and W Punch ldquoClustering ensemblesmodels of consensus and weak partitionsrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 27 no 12 pp1866ndash1881 2005

[106] A Bertoni and G Valentini ldquoEnsembles based on randomprojections to improve the accuracy of clustering algorithmsrdquo inNeural Nets WIRN 2005 vol 3931 of Lecture Notes in ComputerScience pp 31ndash37 Springer 2006

[107] R E Schapire Y Freund P Bartlett and W S Lee ldquoBoostingthe margin a new explanation for the effectiveness of votingmethodsrdquoAnnals of Statistics vol 26 no 5 pp 1651ndash1686 1998

[108] E L Allwein R E Schapire andY Singer ldquoReducingmulticlassto binary a unifying approach for margin classifiersrdquo Journal ofMachine Learning Research vol 1 no 2 pp 113ndash141 2001

[109] E M Kleinberg ldquoOn the algorithmic implementation ofstochastic discriminationrdquo IEEE Transactions on Pattern Anal-ysis and Machine Intelligence vol 22 no 5 pp 473ndash490 2000

[110] L Breiman ldquoBias variance and arcing classifiersrdquo Tech Rep TR460 Statistics Department University of California BerkeleyCalif USA 1996

[111] J Friedman and P Hall ldquoOn bagging and nonlinear estimationrdquoTech Rep Statistics Department University of Stanford Stan-ford Calif USA 2000

[112] K DembczynskiWWaegemanW Cheng and E HullermeierldquoOn label dependence in multi-label classificationrdquo in Proceed-ings of the 2nd International Workshop on learning from Multi-Label Data (ICML-MLD rsquo10) pp 5ndash12 Haifa Israel 2010

[113] S Mostafavi and Q Morris ldquoUsing the Gene Ontology hierar-chy when predicting gene functionrdquo in Proceedings of the 25thConference on Uncertainty in Artificial Intelligence (UAI rsquo09) pp419ndash427 AUAI Press Corvallis Ore USA June 2009

[114] L J Jensen R Gupta H-H Staeligrfeldt and S Brunak ldquoPredic-tion of human protein function according to Gene Ontologycategoriesrdquo Bioinformatics vol 19 no 5 pp 635ndash642 2003

[115] A Laeliggreid T R Hvidsten HMidelfart J Komorowski and AK Sandvik ldquoPredicting gene ontology biological process from

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 32: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

32 ISRN Bioinformatics

temporal gene expression patternsrdquo Genome Research vol 13no 5 pp 965ndash979 2003

[116] R Bi Y Zhou F Lu and W Wang ldquoPredicting Gene Ontologyfunctions based on support vector machines and statisticalsignificance estimationrdquo Neurocomputing vol 70 no 4ndash6 pp718ndash725 2007

[117] P Bogdanov and A K Singh ldquoMolecular function predictionusing neighborhood featuresrdquo IEEEACMTransactions onCom-putational Biology and Bioinformatics vol 7 no 2 pp 208ndash2172010

[118] M Riley ldquoFunctions of the gene products of Escherichia colirdquoMicrobiological Reviews vol 57 no 4 pp 862ndash952 1993

[119] R E Schapire and Y Singer ldquoBoosTexter a boosting-basedsystem for text categorizationrdquo Machine Learning vol 39 no2 pp 135ndash168 2000

[120] N Alaydie C K Reddy and F Fotouhi ldquoHierarchical multi-label boosting for gene function predictionrdquo in Proceedings ofthe International Conference on Computational Systems Bioin-formatics (CSB rsquo10) Stanford Calif USA 2010

[121] T Kohonen ldquoThe self-organizing maprdquo Neurocomputing vol21 no 1ndash3 pp 1ndash6 1998

[122] H Borges and J Nievola ldquoHierarchical classification using acompetitive neural networkrdquo in Proceedings of the InternationalConference on Computing Networking and Communications(ICNC rsquo12) pp 172ndash177 2012

[123] N Cesa-Bianchi M Re and G Valentini ldquoSynergy of multi-label hierarchical ensembles data fusion and cost-sensitivemethods for gene functional inferencerdquoMachine Learning vol88 no 1 pp 209ndash241 2012

[124] R Cerri and A C P L F De Carvalho ldquoHierarchical mul-tilabel classification using Top-Down label combination andArtificial Neural Networksrdquo in Proceedings of the 11th BrazilianSymposium on Neural Networks (SBRN rsquo10) pp 253ndash258 IEEEComputer Society October 2010

[125] R Cerri and A de Carvalho ldquoHierarchical multilabel proteinfunction prediction using local neural networksrdquo inAdvances inBioinformatics and Computational Biology vol 6832 of LectureNotes in Computer Science pp 10ndash17 Springer 2011

[126] D E Rumelhart R Durbin and Y Chauvin ldquoBackpropagationthe basic theoryrdquo in Backpropagation Theory Architectures andApplications D E Rumelhart and Y Chauvin Eds pp 1ndash34Lawrence Erlbaum Hillsdale NJ USA 1995

[127] J Hernandez L E Sucar and EMorales ldquoA hybrid global-localapproach forhierarchcial classificationrdquo in Proceedings of the26th International Florida Artificial Intelligence Research SocietyConference pp 432ndash437 2013

[128] B Paes A Plastion and A Freitas ldquoImproving loacal per levelhierarchical classificationrdquo Journal of Information and DataManagement vol 3 no 3 pp 394ndash409 2012

[129] G Tsoumakas I Katakis and I Vlahavas ldquoRandom k-labelsetsfor multilabel classificationrdquo IEEE Transactions on Knowledgeand Data Engineering vol 23 no 7 pp 1079ndash1089 2011

[130] HWang X Shen andW Pan ldquoLarge margin hierarchical clas-sification with mutually exclusive class membershiprdquo Journal ofMachine Learning Research vol 12 pp 2721ndash2748 2011

[131] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[132] M des Jardins P Karp M Krummenacker T J Lee and CA Ouzounis ldquoPrediction of enzyme classification from proteinsequence without the use of sequence similarityrdquo in Proceedingsof the 5th International Conference on Intelligent Systems forMolecular Biology (ISMB rsquo97) pp 92ndash99 AAAI Press 1997

[133] A Sharkey N Sharkey U Gerecke and G Chandroth ldquoThetest and select approach to ensemble combinationrdquo inMultipleClassifier Systems First International Workshop MCS 2000Cagliari Italy J Kittler and F Roli Eds vol 1857 of LectureNotes in Computer Science pp 30ndash44 Springer 2000

[134] Y Kourmpetis A van Dijk and C ter Braak ldquoGene Ontologyconsistent protein function prediction the FALCON algorithmapplied to six eukaryotic genomesrdquo Algorithms for MolecularBiology vol 8 article 10 2013

[135] N Cesa-Bianchi C Gentile A Tironi and L Zaniboni ldquoIncre-mental algorithms for hierarchical classificationrdquo in Advancesin Neural Information Processing Systems vol 17 pp 233ndash240MIT Press 2005

[136] M Re and G Valentini ldquoAn experimental comparison ofHierarchical Bayes and True Path Rule ensembles for proteinfunction predictionrdquo in Multiple Classifier Systems NinethInternational Workshop MCS 2010 Cairo Egypt N El-GayarEd vol 5997 of Lecture Notes in Computer Science pp 294ndash303Springer 2010

[137] A J Smola and R Kondor ldquoKernels and regularization ongraphsrdquo in Proceedings of the 16th Annual Conference on Learn-ing Theory B Scholkopf and M K Warmuth Eds LectureNotes in Computer Science pp 144ndash158 Springer August 2003

[138] C Leslie E Eskin A Cohen J Weston and W S NobleldquoMismatch string kernels for svm protein classificationrdquo inAdvances in Neural Information Processing Systems pp 1441ndash1448 MIT Press Cambridge Mass USA 2003

[139] M J Wainwright and M I Jordan ldquoGraphical models expo-nential families and variational inferencerdquo Foundations andTrends in Machine Learning vol 1 no 1-2 pp 1ndash305 2008

[140] R E Barlow andH D Brunk ldquoThe isotonic regression problemand its dualrdquo Journal of the American Statistical Association vol67 no 337 pp 140ndash147 1972

[141] O Burdakov O Sysoev A Grimvall and M Hussian ldquoAn119874(1198992) algorithm for isotonic regressionrdquo in Large-Scale Non-

linear Optimization vol 83 of Nonconvex Optimization and ItsApplications pp 25ndash33 Springer 2006

[142] G Valentini and M Re ldquoWeighted True Path Rule a multi-label hierarchical algorithm for gene function predictionrdquo inProceedings of the 1st International Workshop on learning fromMulti-Label Data (MLD-ECML rsquo09) pp 133ndash146 Bled Slovenia2009

[143] Gene Ontology Consortium True path rule 2010 httpwwwgeneontologyorgGOusageshtmltruePathRule

[144] B Chen L Duan and J Hu ldquoComposite kernel based svmfor hierarchical multilabel gene function classificationrdquo in Pro-ceedings of the 26th International Florida Artificial IntelligenceResearch Society Conference pp 1380ndash1385 2012

[145] B Chen and J Hu ldquoHierarchical multi-label classification basedon over-sampling and hierarchy constraint for gene functionpredictionrdquo IEEJ Transactions on Electrical and Electronic Engi-neering vol 7 no 2 pp 183ndash189 2012

[146] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[147] R D King A Karwath A Clare and L Dehaspe ldquoThe utilityof different representations of protein sequence for predictingfunctional classrdquoBioinformatics vol 17 no 5 pp 445ndash454 2001

[148] H Blockeel M Bruynooghe S Dzeroski J Ramon and JStruyf ldquoTop-down induction of clustering treesrdquo in Proceedingsof the 15th International Conference on Machine Learning pp55ndash63 1998

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 33: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

ISRN Bioinformatics 33

[149] H Blockeel L Schietgat S Dzeroski and A Clare ldquoDecisiontrees for hierarchical multilabel classification a case study infunctional genomicsrdquo in Proc of the 10th European Conferenceon Principles and Practice of Knowledge Discovery in Databasesvol 4213 of Lecture Notes in Artificial Intelligence pp 18ndash29Springer 2006

[150] D Stojanova M Ceci D Malerba and S Deroski ldquoUsing PPInetwork autocorrelation in hierarchical multi-label classifica-tion trees for gene function predictionrdquo BMC Bioinformaticsvol 14 article 285 2013

[151] F Otero A Freitas and C Johnson ldquoA hierarchical classi-fication ant colony algorithm for predicting gene ontologytermsrdquo in Evolutionary Computation Machine Learning andData Mining in Bioinformatics vol 5483 of Lecture Notes inComputer Science pp 68ndash79 Springer 2009

[152] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[153] D Kocev C Vens J Struyf and S Dzeroski ldquoTree ensemblesfor predicting structured outputsrdquo Pattern Recognition vol 46no 3 pp 817ndash833 2013

[154] W S Noble and A Ben-Hur ldquoIntegrating information forprotein function predictionrdquo in Bioinformaticsmdashfrom GenomesToTherapies T Lengauer Ed vol 3 pp 1297ndash1314Wiley-VCH2007

[155] L Lan N Djuric Y Guo and S Vucetic ldquoMS-kNN proteinfunction prediction by integrating multiple data sourcesrdquo BMCBioinformatics vol 14 supplement 3 article S8 2013

[156] A Sokolov C Funk K Graim K Verspoor and A Ben-Hur ldquoCombining heterogeneous data sources for accuratefunctional annotation of proteinsrdquo BMC Bioinformatics vol 14supplement 3 article S10 2013

[157] C L Myers and O G Troyanskaya ldquoContext-sensitive dataintegration and prediction of biological networksrdquoBioinformat-ics vol 23 no 17 pp 2322ndash2330 2007

[158] S Mostafavi and Q Morris ldquoFast integration of heterogeneousdata sources for predicting gene function with limited annota-tionrdquo Bioinformatics vol 26 no 14 Article ID btq262 pp 1759ndash1765 2010

[159] M Mesiti E Jimenez-Ruiz I Sanz et al ldquoXML-basedapproaches for the integration of heterogeneous bio-moleculardatardquoBMCBioinformatics vol 10 no 12 article 1471 p S7 2009

[160] S Sonnenburg G Ratsch C Schafer and B Scholkopf ldquoLargescale multiple kernel learningrdquo Journal of Machine LearningResearch vol 7 pp 1531ndash1565 2006

[161] A Rakotomamonjy F Bach S Canu and Y Grandvalet ldquoMoreefficiency inmultiple kernel learningrdquo in Proceedings of the 24thInternational Conference on Machine Learning (ICML rsquo07) pp775ndash782 ACM New York NY USA June 2007

[162] G R Lanckriet M Deng N Cristianini M I Jordan andW S Noble ldquoKernel-based data fusion and its application toprotein function prediction in yeastrdquo Proceedings of the PacificSymposium on Biocomputing pp 300ndash311 2004

[163] D P Lewis T Jebara andW S Noble ldquoSupport vector machinelearning from heterogeneous data an empirical analysis usingprotein sequence and structurerdquo Bioinformatics vol 22 no 22pp 2753ndash2760 2006

[164] N Cesa-BianchiM Re andG Valentini ldquoFunctional inferencein FunCat through the combination of hierarchical ensembleswith data fusion methodsrdquo in Proceedings of the 2nd Interna-tional Workshop on learning from Multi-Label Data (MLD rsquo10)pp 13ndash20 Haifa Israel 2010

[165] G Yu H Rangwala C Domeniconi G Zhang and Z ZhangldquoProtein function prediction by integrating multiple kernelsrdquoin Proceedings of the 23rd International Joint Conference onArtificial Intelligence (IJCAI rsquo13) Beijing China 2013

[166] D M Titterington G D Murray L S Murray et al ldquoCompar-ison of discrimination techniques applied to a complex data setof head injured patientsrdquo Journal of the Royal Statistical SocietyA vol 144 no 2 pp 145ndash175 1981

[167] N C de Condorcet Essai sur lrsquoApplication de lrsquoAnalyse ala Probabilite des Decisions Rendues a la Pluralite des VoixImprimerie Royale Paris France 1785

[168] J Kittler M Hatef R P W Duin and J Matas ldquoOn combiningclassifiersrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 20 no 3 pp 226ndash239 1998

[169] L I Kuncheva J C Bezdek and R P W Duin ldquoDecisiontemplates for multiple classifier fusion an experimental com-parisonrdquo Pattern Recognition vol 34 no 2 pp 299ndash314 2001

[170] M Re and G Valentini ldquoNoise tolerance of multiple classifiersystems in data integration-based gene function predictionrdquoJournal of Integrative Bioinformatics vol 7 no 3 article 1392010

[171] M Re and G Valentini ldquoEnsemble based data fusion forgene function predictionrdquo inMultiple Classifier Systems EighthInternationalWorkshopMCS 2009 Reykjavik Iceland J KittlerJ Benediktsson and F Roli Eds vol 5519 of Lecture Notes inComputer Science pp 448ndash457 Springer 2009

[172] M Re and G Valentini ldquoPrediction of gene function usingensembles of SVMs and heterogeneous data sourcesrdquo in Appli-cations of Supervised and Unsupervised Ensemble Methods vol245 of Computational Intelligence Series pp 79ndash91 Springer2009

[173] M Re and G Valentini ldquoIntegration of heterogeneous datasources for gene function prediction using decision templatesand ensembles of learning machinesrdquo Neurocomputing vol 73no 7-9 pp 1533ndash1537 2010

[174] R D Finn J Tate J Mistry et al ldquoThe Pfam protein familiesdatabaserdquoNucleic Acids Research vol 36 no 1 pp D281ndashD2882008

[175] A P Gasch P T Spellman C M Kao et al ldquoGenomic expres-sion programs in the response of yeast cells to environmentalchangesrdquoMolecular Biology of the Cell vol 11 no 12 pp 4241ndash4257 2000

[176] C Stark B-J Breitkreutz T Reguly L Boucher A Breitkreutzand M Tyers ldquoBioGRID a general repository for interactiondatasetsrdquo Nucleic Acids Research vol 34 pp D535ndashD539 2006

[177] C Von Mering R Krause B Snel et al ldquoComparative assess-ment of large-scale data sets of protein-protein interactionsrdquoNature vol 417 no 6887 pp 399ndash403 2002

[178] G Valentini and N Cesa-Bianchi ldquoHCGene a software tool tosupport the hierarchical classification of genesrdquo Bioinformaticsvol 24 no 5 pp 729ndash731 2008

[179] A Ben-Hur and W S Noble ldquoChoosing negative examples forthe prediction of protein-protein interactionsrdquo BMC Bioinfor-matics vol 7 supplement 1 article S2 2006

[180] N Skunca A Altenhoff and C Dessimoz ldquoQuality of compu-tationally inferred gene ontology annotationsrdquo PLoS Computa-tional Biology vol 8 no 5 Article ID e1002533 2012

[181] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 34: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

34 ISRN Bioinformatics

[182] G Batista R Prati and M Monard ldquoA study of the behavior ofseveral methods for balancing machine learning training datardquoACM SIGKDD Explorations Newsletter vol 6 no 1 pp 20ndash292004

[183] M Kubat and S Matwin ldquoAddressing the curse of imbalancedtraining sets one-sided selectionrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) pp179ndash187 Nashville Tenn USA 1997

[184] H Guo and H Viktor ldquoLearning from imbalanced datasets with boosting and data generation the Databoost-IMapproachrdquo ACM SIGKDD Explorations Newsletter vol 6 no 1pp 30ndash39 2004

[185] Y Sun M S Kamel A K C Wong and Y Wang ldquoCost-sensitive boosting for classification of imbalanced datardquo PatternRecognition vol 40 no 12 pp 3358ndash3378 2007

[186] Y Zhang and D Wang ldquoCost-sensitive ensemble method forclass-imbalanced datasetsrdquo Abstract and Applied Analysis vol2013 Article ID 196256 6 pages 2013

[187] M Galar A Fernandez E Barrenechea H Bustince andF Herrera ldquoA review on ensembles for the class imbalanceproblem bagging- boosting- and hybrid-based approachesrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 42 no 4 pp 463ndash484 2012

[188] C Huttenhower M A Hibbs C L Myers A A Caudy DC Hess and O G Troyanskaya ldquoThe impact of incompleteknowledge on evaluation an experimental benchmark forprotein function predictionrdquo Bioinformatics vol 25 no 18 pp2404ndash2410 2009

[189] G Yu C Domeniconi H Rangwala G Zhang and Z YuldquoTransductive multilabel ensemble classification for proteinfunction predictionrdquo in Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining pp 1077ndash1085 2012

[190] G Yu H Rangwala C Domeniconi G Zhang and Z YuldquoProtein function prediction with incomplete annotationsrdquoIEEE Transactions on Computational Biology and Bioinformat-ics 2013

[191] Gene Ontology Consortium ldquoGene Ontology annotations andresourcesrdquoNucleic Acids Research vol 41 pp D530ndashD535 2013

[192] A A Freitas D CWieser and R Apweiler ldquoOn the importanceof comprehensible classification models for protein functionpredictionrdquo IEEEACM Transactions on Computational Biologyand Bioinformatics vol 7 no 1 pp 172ndash182 2010

[193] R Cerri R Barros A de Carvalho and A Freitas ldquoA grammat-ical evolution algorithm for generation of hierarchical multi-label classification rulesrdquo in Proceedings of the IEEE Congress onEvolutionary Computation 2013

[194] CWidmer andG Ratsch ldquoMultitask learning in computationalbiologyrdquo Journal of Machine Learning Research WampP vol 27pp 207ndash216 2012

[195] J Gonzalez Y Low H Gu D Bickson and C GuestrinldquoPowerGraph distributed graph-parallel computation on nat-ural graphsrdquo in Proceedings of the 10th USENIX conference onOperating Systems Design and Implementation (OSDI rsquo12) pp17ndash30 Hollywood Calif USA 2012

[196] M Mesiti M Re and G Valentini ldquoScalable network-basedlearning methods for automated function prediction based onthe neo4j graph-databaserdquo in Proceedings of the AutomatedFunction Prediction SIG 2013mdashISMB Special Interest GroupMeeting pp 6ndash7 Berlin Germany 2013

[197] H W Mewes K Albermann M Bahr et al ldquoOverview of theyeast genomerdquo Nature vol 387 no 6632 pp 7ndash8 1997

[198] H W Mewes D Frishman C Gruber et al ldquoMIPS a databasefor genomes and protein sequencesrdquoNucleic Acids Research vol28 no 1 pp 37ndash40 2000

[199] K R Christie S Weng R Balakrishnan et al ldquoSaccharomycesGenome Database (SGD) provides tools to identify and ana-lyze sequences from Saccharomyces cerevisiae and relatedsequences from other organismsrdquo Nucleic Acids Research vol32 pp D311ndashD314 2004

[200] K Verspoor DDvorkin K B Cohen and LHunter ldquoOntologyquality assurance through analysis of term transformationsrdquoBioinformatics vol 25 no 12 pp i77ndashi84 2009

[201] K Verspoor J Cohn S Mniszewski and C Joslyn ldquoA catego-rization approach to automated ontological function annota-tionrdquo Protein Science vol 15 no 6 pp 1544ndash1549 2006

[202] S Kiritchenko S Matwin and A Famili ldquoHierarchical textcategorization as a tool of associating geneswithGeneOntologycodesrdquo in Proceedings of the 2nd European Workshop on DataMining and Text Mining for Bioinformatics pp 26ndash30 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 35: ReviewArticle Hierarchical Ensemble Methods for Protein ... · ReviewArticle Hierarchical Ensemble Methods for Protein Function Prediction GiorgioValentini AnacletoLab-DipartimentodiInformatica,Universit

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

Hindawi Publishing Corporationhttpwwwhindawicom

GenomicsInternational Journal of

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology