eboc: ensemble-based ordinal classification in...

18
Research Article EBOC: Ensemble-Based Ordinal Classification in Transportation Pelin YJldJrJm, 1 UlaG K. Birant, 2 and Derya Birant 2 1 Department of Soſtware Engineering, Manisa Celal Bayar University, 45400, Manisa, Turkey 2 Department of Computer Engineering, Dokuz Eylul University, 35390, Izmir, Turkey Correspondence should be addressed to Derya Birant; [email protected] Received 3 October 2018; Revised 20 January 2019; Accepted 5 March 2019; Published 24 March 2019 Guest Editor: Mohammad H. Y. Moghaddam Copyright © 2019 Pelin Yıldırım et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. is article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. e results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions. 1. Introduction Machine learning plays an important role in many prediction problems by constructing a model from explored dataset. e most common task in learning process is classification. Classification is the process of assigning an input item in a collection to predefined classes by discovering relationships among instances in the training set. Classification has a wide range of applications, such as document categorization, medi- cal diagnosis, fraud detection, pattern recognition, sentiment analysis, risk assessment, and signal processing. Transportation is a sector which focuses on replacement of humans, animals, and stuff from one position to another. Developments in the field of transportation reveal a need to discover associations and obtain complex and nonlinear relations underlying in a vast amount of transportation data. Because of this necessity, in recent years, machine learning techniques, especially classification algorithms, commenced to be used in transportation sector as an interdisciplinary approach. e underlying goals for these solutions are to predict traffic flow [1], classify vehicle images [2], identify different transportation modes [3], analyse traffic incident’s severity [4], mitigate unfavourable environmental impacts (i.e., to optimize energy usage [5]), develop autonomous driv- ing system [6], and improve the productivity and efficiency of transportation systems. e relationship between the class labels in the dataset to which the machine learning algorithms are applied influences the classification performance. In the literature, for the data of which class attribute values involve some order or a sort of ranking system, ordinal classification approach is proposed. Ordinal classification predicts the label of a new ordinal sample by taking ranking relation among the classes into consideration [7]. An example of an ordinal class attribute is one that has the values “large,” “medium,” and “small” for a size attribute. It is clear that there is an order among those values and that we can write large > medium > small. Although there are many classification studies [2– 6] performed in transportation sector, to the best of our knowledge, there has been no prior detailed investigation for ordinal classification in transportation sector. Considering this drawback, the study presented in this article focuses on the application of ordinal classification algorithms on real- world transportation datasets. Some transportation studies contain an ordinal response variable; for example, the injury severity of traffic accidents can be categorized as fatal > Hindawi Journal of Advanced Transportation Volume 2019, Article ID 7482138, 17 pages https://doi.org/10.1155/2019/7482138

Upload: others

Post on 21-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Research ArticleEBOC Ensemble-Based Ordinal Classification in Transportation

Pelin YJldJrJm1 UlaG K Birant2 and Derya Birant 2

1Department of Software Engineering Manisa Celal Bayar University 45400 Manisa Turkey2Department of Computer Engineering Dokuz Eylul University 35390 Izmir Turkey

Correspondence should be addressed to Derya Birant deryacsdeuedutr

Received 3 October 2018 Revised 20 January 2019 Accepted 5 March 2019 Published 24 March 2019

Guest Editor Mohammad H Y Moghaddam

Copyright copy 2019 Pelin Yıldırım et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for makingright decisions For this purpose machine learning solution has already begun its promising marks in transportation as well asin many areas such as marketing finance education and health However many classification algorithms in the literature assumethat the target attribute values in the datasets are unordered so they lose inherent order between the class values To overcome theproblem this study proposes a novel ensemble-based ordinal classification (EBOC) approachwhich suggests bagging and boosting(AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector This article also comparesthe proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (ie C45 decisiontree RandomTree and REPTree) in terms of accuracy The results indicate that the proposed EBOC approach achieves betterclassification performance than the conventional solutions

1 Introduction

Machine learning plays an important role in many predictionproblems by constructing a model from explored datasetThe most common task in learning process is classificationClassification is the process of assigning an input item in acollection to predefined classes by discovering relationshipsamong instances in the training set Classification has a widerange of applications such as document categorizationmedi-cal diagnosis fraud detection pattern recognition sentimentanalysis risk assessment and signal processing

Transportation is a sector which focuses on replacementof humans animals and stuff from one position to anotherDevelopments in the field of transportation reveal a needto discover associations and obtain complex and nonlinearrelations underlying in a vast amount of transportation dataBecause of this necessity in recent years machine learningtechniques especially classification algorithms commencedto be used in transportation sector as an interdisciplinaryapproach The underlying goals for these solutions are topredict traffic flow [1] classify vehicle images [2] identifydifferent transportation modes [3] analyse traffic incidentrsquosseverity [4] mitigate unfavourable environmental impacts

(ie to optimize energy usage [5]) develop autonomous driv-ing system [6] and improve the productivity and efficiency oftransportation systems

The relationship between the class labels in the dataset towhich themachine learning algorithms are applied influencesthe classification performance In the literature for the dataof which class attribute values involve some order or a sort ofranking system ordinal classification approach is proposedOrdinal classification predicts the label of a new ordinalsample by taking ranking relation among the classes intoconsideration [7] An example of an ordinal class attributeis one that has the values ldquolargerdquo ldquomediumrdquo and ldquosmallrdquofor a size attribute It is clear that there is an order amongthose values and that we can write large gt medium gtsmall Although there are many classification studies [2ndash6] performed in transportation sector to the best of ourknowledge there has been no prior detailed investigation forordinal classification in transportation sector Consideringthis drawback the study presented in this article focuses onthe application of ordinal classification algorithms on real-world transportation datasets Some transportation studiescontain an ordinal response variable for example the injuryseverity of traffic accidents can be categorized as fatal gt

HindawiJournal of Advanced TransportationVolume 2019 Article ID 7482138 17 pageshttpsdoiorg10115520197482138

2 Journal of Advanced Transportation

Machine Learning

Supervised Learning

Classification

Nominal Classification

Ordinal Classification

Approach 1Binary

ClassificationApproach 2Regression

MethodsApproach 3

Specific Techniques

RegressionUnsupervised

Learning

Figure 1 The classification methods of machine learning

serious gt slight similarly traffic volume can be classified ashigh gtmedium gt low

Meanwhile ensemble learning has been recently pre-ferred in machine learning for the classification task becauseof the high prediction ability it provides Ensemble learning isa machine learning technique which combines a set of baselearning models to get a single final prediction [8] Theselearning models can be any classification algorithms suchas neural network (NN) Naive Bayes (NB) decision treesupport vector machine (SVM) regression and k-nearestneighbour (KNN) Many studies in the literature have statedthat ensemble learners improve prediction performance ofindividual learning algorithms There exist various ensemble-learning methods bagging boosting stacking and votingIn this study bagging and boosting (AdaBoost algorithm)methods are selected due to their popularity for the solutionof ordinal classification problem in transportation sector

The novelty and main contributions of this article areas follows (i) it provides a brief survey of ordinal classi-fication and ensemble learning which has been revealedto improve prediction performance of the traditional clas-sification algorithms (ii) it is the first study in which theordinal classification methods have been implemented intransportation sector (iii) it proposes a novel ensemble-basedordinal classification (EBOC) approach for transportationand (iv) it presents experimental studies conducted on twelvedifferent real-world transportation datasets to demonstratethat the proposed EBOC approach shows better classificationresults than both ordinal class classifier and traditional tree-based classification algorithms in terms of accuracy

The remainder of this article is structured as follows inthe following section related literature and previous workson the subject are summarized Section 3 gives backgroundinformation about ordinal classification and ensemble learn-ing This section also explains utilized tree-based algorithmssuch as C45 decision tree RandomTree and REPTree indetail In Section 4 the proposed ensemble-based ordinalclassification (EBOC) approach for transportation sector isdefined Section 5 gives the description of transportationdatasets used in this study The application of traditionalalgorithms and proposed method on the transportationdatasets and the experimental results of them with discus-sions are also presented in this sectionMoreover all obtainedresults were validated by three statistical methods to ensurethe significance of differences among the classifiers on thedatasets including multiple comparisons (Friedman test and

Quade test) and pairwise comparisons (Wilcoxon signedrank test) Finally the last section gives some concludingremarks and future directions

2 Related Work

In machine learning there are two main types of taskssupervised learning and unsupervised learning The clas-sification process which is one of the supervised learningtechniques is divided into two categories nominal classifi-cation (where no order is assumed between the classes) andordinal classification (where the ordinal relationship betweendifferent class labels should be taken into account) Thedifference between ordinal and nominal classification is notremarkable in the case of binary classification owing to thefact that there is always an implicit order in ldquopositive classrdquoand ldquonegative classrdquo In multiclass classification problemsstandard classification algorithms for nominal classes canbe applied to ordinal prediction problems by discarding theordering information in the class attribute However thisapproach does not take advantage of the inner structure ofthe data and some information that can potentially improvethe predictive performance of a classifier is lost since itignores the existing natural order of the classesThe literaturepresents three different approaches for the ordinal classi-fication problem binary classification regression methodsand specific techniques The ordinal classification paradigmis summarized in the graph in Figure 1 In this articlea novel ordinal classification study was performed for thetransportation sector

The first approach for the ordinal classification processis to convert an ordinal classification problem into severalbinary classification problems In this type of studies [9 10]two-class classification algorithms are applied on ordinalvalued datasets after transforming 119896 class problem into a set ofk-1 binary subproblems For example in the study presentedin [10] the researchers proposed a novel approach whichreduces the problem of classifying ordered classes to stan-dard two-class problem They introduced a data replicationmethod which is then mapped into neural networks andsupport vector machines In the experiments they appliedthe proposed method on both artificial and real datasetsfor gene expression analysis Li and Lin [11] developed areduction framework for ordinal classification that consistsof three steps (i) extracting extended examples from trainingexamples by using weights (ii) training a classifier on the

Journal of Advanced Transportation 3

extended examples with any binary classification algorithmand (iii) constructing a ranking rule from the binary classifier

As a second approach regression methods [12 13] canbe used to deal with ordinal classification problem sinceregression models have been thought for continuous dataIn this method categorical ordinal data is converted intocontinuous data scale and then regression algorithms areapplied on this transformed data as a postprocessing stepHowever a disadvantage of this approach is that the naturalorder of the class values is discarded and the inner structureof the samples is lost In [12] two most commonly usedordinal logistic models were applied onmedical ordinal dataproportional odds (PO) form of an ordinal logistic modeland the forward continuation ratio (CR) ordinal logisticmodel Rennie and Srebro [13] applied two general threshold-based constructions (the logistic and hinge loss) on the onemillion MovieLens dataset The experimental results statedthat their proposed approach shows more accurate resultsthan traditional classification and regression models

In the last approach problem-specific techniques [14ndash18]were developed for ordinal data classification by modifyingpresent classification algorithms The main advantage ofthis approach is to retain the order among the class labelsHowever someof thempresent some complexities in terms ofimplementation and training These are complex and requirenontrivial changes in the training methods such as modifi-cation of the objective function or using a threshold-basedmodel Keith and Meneses [14] proposed a novel techniquecalled Barycentric Coordinates for Ordinal Classification(BCOC) which uses barycentric coordinates to representordinal classes geometrically They applied their proposedmethod on the field of sentiment analysis and presentedeffective results for complex datasets Researchers in anotherstudy [17] presented a novel heuristic rule learning approachwith monotonicity constraints including two novel justifi-ability measures for ordinal classification The experimentswere performed to test the proposed approach and the resultsindicated that the novel method showed high predictionperformance by guaranteeing monotone classification withlow rule set increase

Under favour of its high prediction performanceensemble-learning techniques commenced to be preferred inordinal classification [19ndash22] as well as nominal classificationproblems Hechenbichler and Schliep [20] proposed anextended weighted k-nearest neighbor (wkNN) methodfor the ordinal class structure In their study weightedmajority vote mechanism was used for the aggregationprocess In the other study [21] an enhanced ensembleof support vector machines method was developed forordinal regression The proposed approach was implementedon the benchmark synthetic datasets and was comparedwith a kernel based ranking method in the experimentsLin [23] introduced a novel threshold ensemble modeland developed a reduction framework to reduce ordinalranking to weighted binary classification by extendingSVM and AdaBoost algorithms The results of all thesestudies show that the ensemble methods perform well on thedatasets and provide better performance than the individualmethods

Differently from existing studies our work proposes anovel ensemble-based ordinal classification approach includ-ing bagging and boosting (AdaBoost algorithm) methodsAlso the present study is the first study in which an ordinalclassification paradigm is implemented on real-world trans-portation datasets to model the behaviour of transportationsystems

3 Background Information

In this section background information about ordinal clas-sification classification algorithms and ensemble learning ispresented to provide the context for this research

31 Ordinal Classification In most of classification prob-lems target attribute values which will be predicted usuallyassumed that they have no ordering relation between themHowever the class labels of some datasets have inherentorder For example when predicting the price of an objectordered class labels such as ldquoexpensiverdquo ldquonormalrdquo andldquocheaprdquo can be used The order among these class labels isclearly understood and denoted as ldquoexpensiverdquo gt ldquonormalrdquogt ldquocheaprdquo Because of this reason classification techniquesdiffer according to nominal and ordinal data types

Nominal Data The data which have no quantitative valueis named nominal data To label variables nominal scalesare utilized such as vehicle types (ie car train and bus) ordocument topics (ie science business and sports)

Ordinal Data In ordinal data the values have a natural orderIn this data type the order of values is significant such as dataobtained from the use of a Likert scale

Ordinal classificationwhich is proposed for the predictionof ordinal target values is one of the most important classi-fication problems in machine learning This paradigm aimsto predict the unknown class values of an attribute 119910 thathave a natural order In the ordinal classification problemthe ordinal dataset 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) hasa set of119898 items with input feature space119883 and class attribute119910 = 1198881 1198882 119888119896 has 119896 class labels with an order 119888119896 gt 119888119896minus1 gt gt 1198881 where gt denotes the relation of ordering In otherwords an example (x y) is composed of an input vector x 120598X and an ordinal label (ie rank) 119910 120598119884 = 1 2 119870 Theproblem is to predict the example (x y) as rank k where k =12 K

In a decision tree-based ordinal classification data istransformed from a k-class ordinal problem to kminus1 binaryclass problems that encode the ordering of the class labelsas the first step The ordinal dataset with classes 1198881 1198882 119888119896 is converted into binary datasets by discriminating 1198881 119888119894 against 119888119894+1 119888119896 that represents the test 119888119909 gt 119894 Inother words the upward unions of classes are consideredprogressively in each stage of binary datasets constructionThen a standard tree-based learning algorithm is employedon the derived binary datasets to construct k-1 models As aresult each model predicts the cumulative probability of aninstance of belonging to a certain class To predict the class

4 Journal of Advanced Transportation

label of an unseen instance the probability for each ordinalclass P(119888119909) is estimated by using k-1 models The estimationof the probability for the first ordinal class label depends ona single classifier 1-P(target gt 1198881) The probability of the lastordinal class is given by P(target gt 119888119896minus1) In the middle ofthe range the probability is calculated by a pair of classifiersP(target gt 119888119894minus1) - P(target gt 119888119894) where 1 lt i lt k Finally wechoose the class label with the highest probability

The core ideas of using decision tree algorithms forordinal classification problems can be described in threefolds

First Frank and Hall [7] reported that decision tree-based ordinal classification algorithm resulted in a significantimprovement over the standard version on 29 UCI bench-mark datasets They also experimentally confirmed that theperformance gap increases with the number of classes Mostimportantly their results showed that decision tree-basedordinal classification was able to generally produce a muchsimpler model with better ordinal prediction accuracy

Second to consider the inherent order of class labelsother standard classification algorithms such as KNN [20]SVM [21] and NN require modification (nontrivial changes)in the training methods Traditional regression techniquescan also be applied on ordinal valued data due to their abilityto classify interval or ratio quantity values However theirapplication to truly ordinal problems is necessarily ad hoc [7]In contrast the key feature of using decision tree for ordinalclassification is that there is noneed tomake anymodificationon the underlying learning algorithm The core idea is simplyto transform the ordinal classification problem into a seriesof binary-class problems

Third owing to the advantages of fast speed high preci-sion and ease of understanding decision tree algorithms arewidely used in classification as well as ordinal classificationOrdinal classification is sensitive to noise in data Even a fewnoisy samples exist in the ordinal dataset they can changethe classification results of the overall system However in thepresence of noisy andmissing data a pruned decision tree canreflect the order structure information very well with goodgeneralization ability The benefits of implementing decisiontree-based ordinal classification are not limited to these Itbuilds white box models so the trees constructed from thealgorithm can be visualized and the classification results canbe easily explained by Boolean logic In addition the mostimportant features which are near to the root node areemphasized via the construction of the tree for the ordinalprediction task

32 Tree-Based Classification Algorithms In this work threetree-based classification algorithms (C45 decision tree Ran-domTree and REPTree) are used as base learners to imple-ment ordinal classification process on transportation datasetsthat consists of ordered class values

321 C45 Decision Tree Decision tree is one of the mostsuccessful classification algorithms which predicts unknownclass attributes using a tree structure grown with depth-first strategy The structure of decision tree consists of

nodes branches and leaves that represent attributes attributevalues and class labels respectively In the literature there areseveral decision tree algorithms such as C45 ID3 (iterativedichotomiser) CART (classification and regression trees)and CHAID (chi-squared automatic interaction detector)In this study C45 decision tree algorithm was used dueto its popularity for the ordinal classification problem intransportation sector

The first step in C45 decision tree algorithm is to specifyroot of the tree The attribute which gives the most determi-nant information for the prediction process is selected for theroot node To determine the order of features in the decisiontree information gain formula is evaluated for each attributeas defined in

119866119886119894119899 (119878 119860) = 119864119899119905119903119900119901119910 (119878)

minus sumVisinV119886119897119906119890119904(119860)

10038161003816100381610038161198781198811003816100381610038161003816|119878| 119864119899119905119903119900119901119910 (119878119881)

(1)

where 119878V is subset of states 119878 for attribute 119860 with value VTheentropy indicates the impurity of a particular attribute in thedataset as defined in

119864119899119905119903119900119901119910 (119878) =119899

sum119894=1

minus 119901119894 log2 119901119894 (2)

where 119894 is a state 119901119894 is the possibility of outcome being instate 119894 for the set 119878 and 119899 is the number of possible outcomesThe attribute that has the maximum information gain value isselected for the root of the tree Likewise all attributes in thedataset are placed to the tree according to their informationgain values

322 RandomTree Assume a training set 119879 with 119911 attributesand m instances and RandomTree is an algorithm forconstructing a tree that considers 119909 randomly chosen features(xlez) with119898 instances at each node [24]While in a standarddecision tree each node is split using the best split amongall features in a random tree each node is split using thebest among the subset of attributes randomly chosen at thatnodeThe tree is built to its maximum depth in other wordsno pruning procedure is applied after the tree has been fullybuilt The algorithm can deal with both classification andregression problems In classification task the predicted classfor a sample is determined by traversing the tree from theroot node to a leaf according to the question posed about anindicator value at that node When a leaf node is reached itslabel determines the classification decision The RandomTreealgorithm does not need any accuracy estimation metricdifferent from the other classification algorithms

323 REPTree The reduced error pruning tree (REPTree)algorithm builds a decision regression tree using informa-tion gain variance and prunes it using a simple and fasttechnique [25]Thepruning technique starts from the bottomlevel of the tree (from leaf nodes) and replaces the nodewith most famous class This change is accepted only if theprediction accuracy is good By this way it helps to minimize

Journal of Advanced Transportation 5

the size of decision trees by removing sections of the treethat gives little capacity to classify instances The values ofnumeric attributes are sorted once Missing values are alsodealt with by splitting the corresponding instances into parts

33 Ensemble Learning Ensemble learning is a machinelearning technique that combines a set of individual learnersand predicts a final output [26] First each learner in theensemble structure is trained separately and multiple classi-fication models are constructed Then the obtained outputsfrom each model are compounded by a voting mechanismThe commonly used voting method available for categoricaltarget values is major class labelling and the voting methodsfor numerical target values are average weighted averagemedian minimum and maximum

Ensemble-learning approach constructs a strong classifierfrommultiple individual learners and aims to improve classi-fication performance by reducing the risk of an unfortunateselection of these learners Many ensemble-based studies [2728] proved that ensemble learners givemore successful resultsthan classical individual learners

In the literature the ensemble methods are generallycategorized under four techniques bagging boosting stack-ing and voting In one of our studies we implementedbagging approach by combining multiple neural networkswhich includes different parameter values for the predictiontask in textile sector [26] In the current research bagging andboosting (AdaBoost algorithm) methods with ordinal classclassifier were applied on transportation datasets For eachmethod (bagging and boosting) tree-based classificationalgorithms (C45 decision tree RandomTree and REPTreealgorithms) were used as base learners separately

331 Bagging Bagging (bootstrap aggregating) is a com-monly applied ensemble techniquewhich constitutes trainingsubsets by selecting random instances from original datasetusing bootstrap method Each classifier in the ensemblestructure is trained by different training sets and so multipleclassification models are produced Then a new sample isgiven to each model and these models predict an output Theobtained outputs are aggregated and a single final output isachieved

General Process of Bagging Let 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) be a set of 119898 items and let 119910119894 120598 119884 = 11988811198882 119888119896 be a set of 119896 class labels C denotes classificationalgorithm and 119899 is number of learners

(1) Draw 119898 items randomly with replacement from thedataset D so generate bootstrap samples 1198631 1198632 119863119899

(2) Each dataset119863119894 is trained by a base learner andmulti-ple classification models are constructed119872119894=C(119863119894)

(3) Consensus of classification models is tested to calcu-late out-of-bag error

(4) New sample 119909 is given to classifiers as input and theoutputs 119910119894 are obtained from each model 119910119894 = 119872119894(119909)

(5) The outputs of models 11987211198722 119872t are com-bined as in

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1 (3)

332 Boosting In boosting method classifiers are trainedconsecutively to convert weak learners to strong ones Aweight value is assigned to each instance in the training setThen in each iteration while the weights of misclassifiedsamples are increased correctly classified ones are decreasedIn this way the misclassified samplesrsquo chances of being in thetraining set increase In this study AdaBoost algorithm wasutilized to implement boosting method

AdaBoost AdaBoost also known as adaptive boosting is themost popular boosting algorithm which trains learners byreweighting instances in the training set iteratively Thenthe outputs produced by each learning model are aggregatedusing a weighted voting mechanism

4 Proposed Method Ensemble-Based OrdinalClassification (EBOC)

The proposed method named ensemble-based ordinal clas-sification (EBOC) combines ensemble-learning paradigmincluding its popular methods such as AdaBoost and baggingwith the traditional ordinal class classifier algorithm toimprove prediction performance The ordinal class classifieralgorithm which was proposed in [7] is used as a base learnerfor the ensemble structure of the proposed approach Inaddition tree-based algorithms such as C45 decision treeRandomTree and REPTree are also used as a classifier inordinal class classifier algorithm

The first step of the ordinal class classifier algorithmis to reduce multiclass ordinal classification problem to abinary classification problem To realize this approach theordinal classification problem with 119896 different class values isconverted to k-1 two-class classification problems Assumethat there is a dataset with four classes (k=4) here the taskis to find two sides (i) class C1 against classes C2 C3 and C4(ii) classes C1 and C2 against classes C3 and C4 and finally(iii) classes C1 C2 and C3 against class C4

For example suppose we have car evaluation dataset[29] which evaluates vehicles according to buying pricemaintenance price and technical characteristics such ascomfort number of doors person capacity luggage boot sizeand safety of the carThis dataset has an ordinal class attributewith four values unacc acc good and vgood First fourdifferent class values of the original dataset are converted tobinary values according to these rules Classes gt ldquounaccrdquoClasses gt ldquoaccrdquo and Classes gt ldquogoodrdquo class valued attribute Ifwe consider ldquoClasses gt lsquounaccrsquordquo rule class values higher thanldquounaccrdquo are labelled as 1 and the others are labelled as 0 In thisway three different transformed datasets that contain binaryclass values are obtained In the next stage a classificationalgorithm (ie C45 RandomTree or REPTree) is appliedon each obtained dataset separately Figure 2 shows the

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

2 Journal of Advanced Transportation

Machine Learning

Supervised Learning

Classification

Nominal Classification

Ordinal Classification

Approach 1Binary

ClassificationApproach 2Regression

MethodsApproach 3

Specific Techniques

RegressionUnsupervised

Learning

Figure 1 The classification methods of machine learning

serious gt slight similarly traffic volume can be classified ashigh gtmedium gt low

Meanwhile ensemble learning has been recently pre-ferred in machine learning for the classification task becauseof the high prediction ability it provides Ensemble learning isa machine learning technique which combines a set of baselearning models to get a single final prediction [8] Theselearning models can be any classification algorithms suchas neural network (NN) Naive Bayes (NB) decision treesupport vector machine (SVM) regression and k-nearestneighbour (KNN) Many studies in the literature have statedthat ensemble learners improve prediction performance ofindividual learning algorithms There exist various ensemble-learning methods bagging boosting stacking and votingIn this study bagging and boosting (AdaBoost algorithm)methods are selected due to their popularity for the solutionof ordinal classification problem in transportation sector

The novelty and main contributions of this article areas follows (i) it provides a brief survey of ordinal classi-fication and ensemble learning which has been revealedto improve prediction performance of the traditional clas-sification algorithms (ii) it is the first study in which theordinal classification methods have been implemented intransportation sector (iii) it proposes a novel ensemble-basedordinal classification (EBOC) approach for transportationand (iv) it presents experimental studies conducted on twelvedifferent real-world transportation datasets to demonstratethat the proposed EBOC approach shows better classificationresults than both ordinal class classifier and traditional tree-based classification algorithms in terms of accuracy

The remainder of this article is structured as follows inthe following section related literature and previous workson the subject are summarized Section 3 gives backgroundinformation about ordinal classification and ensemble learn-ing This section also explains utilized tree-based algorithmssuch as C45 decision tree RandomTree and REPTree indetail In Section 4 the proposed ensemble-based ordinalclassification (EBOC) approach for transportation sector isdefined Section 5 gives the description of transportationdatasets used in this study The application of traditionalalgorithms and proposed method on the transportationdatasets and the experimental results of them with discus-sions are also presented in this sectionMoreover all obtainedresults were validated by three statistical methods to ensurethe significance of differences among the classifiers on thedatasets including multiple comparisons (Friedman test and

Quade test) and pairwise comparisons (Wilcoxon signedrank test) Finally the last section gives some concludingremarks and future directions

2 Related Work

In machine learning there are two main types of taskssupervised learning and unsupervised learning The clas-sification process which is one of the supervised learningtechniques is divided into two categories nominal classifi-cation (where no order is assumed between the classes) andordinal classification (where the ordinal relationship betweendifferent class labels should be taken into account) Thedifference between ordinal and nominal classification is notremarkable in the case of binary classification owing to thefact that there is always an implicit order in ldquopositive classrdquoand ldquonegative classrdquo In multiclass classification problemsstandard classification algorithms for nominal classes canbe applied to ordinal prediction problems by discarding theordering information in the class attribute However thisapproach does not take advantage of the inner structure ofthe data and some information that can potentially improvethe predictive performance of a classifier is lost since itignores the existing natural order of the classesThe literaturepresents three different approaches for the ordinal classi-fication problem binary classification regression methodsand specific techniques The ordinal classification paradigmis summarized in the graph in Figure 1 In this articlea novel ordinal classification study was performed for thetransportation sector

The first approach for the ordinal classification processis to convert an ordinal classification problem into severalbinary classification problems In this type of studies [9 10]two-class classification algorithms are applied on ordinalvalued datasets after transforming 119896 class problem into a set ofk-1 binary subproblems For example in the study presentedin [10] the researchers proposed a novel approach whichreduces the problem of classifying ordered classes to stan-dard two-class problem They introduced a data replicationmethod which is then mapped into neural networks andsupport vector machines In the experiments they appliedthe proposed method on both artificial and real datasetsfor gene expression analysis Li and Lin [11] developed areduction framework for ordinal classification that consistsof three steps (i) extracting extended examples from trainingexamples by using weights (ii) training a classifier on the

Journal of Advanced Transportation 3

extended examples with any binary classification algorithmand (iii) constructing a ranking rule from the binary classifier

As a second approach regression methods [12 13] canbe used to deal with ordinal classification problem sinceregression models have been thought for continuous dataIn this method categorical ordinal data is converted intocontinuous data scale and then regression algorithms areapplied on this transformed data as a postprocessing stepHowever a disadvantage of this approach is that the naturalorder of the class values is discarded and the inner structureof the samples is lost In [12] two most commonly usedordinal logistic models were applied onmedical ordinal dataproportional odds (PO) form of an ordinal logistic modeland the forward continuation ratio (CR) ordinal logisticmodel Rennie and Srebro [13] applied two general threshold-based constructions (the logistic and hinge loss) on the onemillion MovieLens dataset The experimental results statedthat their proposed approach shows more accurate resultsthan traditional classification and regression models

In the last approach problem-specific techniques [14ndash18]were developed for ordinal data classification by modifyingpresent classification algorithms The main advantage ofthis approach is to retain the order among the class labelsHowever someof thempresent some complexities in terms ofimplementation and training These are complex and requirenontrivial changes in the training methods such as modifi-cation of the objective function or using a threshold-basedmodel Keith and Meneses [14] proposed a novel techniquecalled Barycentric Coordinates for Ordinal Classification(BCOC) which uses barycentric coordinates to representordinal classes geometrically They applied their proposedmethod on the field of sentiment analysis and presentedeffective results for complex datasets Researchers in anotherstudy [17] presented a novel heuristic rule learning approachwith monotonicity constraints including two novel justifi-ability measures for ordinal classification The experimentswere performed to test the proposed approach and the resultsindicated that the novel method showed high predictionperformance by guaranteeing monotone classification withlow rule set increase

Under favour of its high prediction performanceensemble-learning techniques commenced to be preferred inordinal classification [19ndash22] as well as nominal classificationproblems Hechenbichler and Schliep [20] proposed anextended weighted k-nearest neighbor (wkNN) methodfor the ordinal class structure In their study weightedmajority vote mechanism was used for the aggregationprocess In the other study [21] an enhanced ensembleof support vector machines method was developed forordinal regression The proposed approach was implementedon the benchmark synthetic datasets and was comparedwith a kernel based ranking method in the experimentsLin [23] introduced a novel threshold ensemble modeland developed a reduction framework to reduce ordinalranking to weighted binary classification by extendingSVM and AdaBoost algorithms The results of all thesestudies show that the ensemble methods perform well on thedatasets and provide better performance than the individualmethods

Differently from existing studies our work proposes anovel ensemble-based ordinal classification approach includ-ing bagging and boosting (AdaBoost algorithm) methodsAlso the present study is the first study in which an ordinalclassification paradigm is implemented on real-world trans-portation datasets to model the behaviour of transportationsystems

3 Background Information

In this section background information about ordinal clas-sification classification algorithms and ensemble learning ispresented to provide the context for this research

31 Ordinal Classification In most of classification prob-lems target attribute values which will be predicted usuallyassumed that they have no ordering relation between themHowever the class labels of some datasets have inherentorder For example when predicting the price of an objectordered class labels such as ldquoexpensiverdquo ldquonormalrdquo andldquocheaprdquo can be used The order among these class labels isclearly understood and denoted as ldquoexpensiverdquo gt ldquonormalrdquogt ldquocheaprdquo Because of this reason classification techniquesdiffer according to nominal and ordinal data types

Nominal Data The data which have no quantitative valueis named nominal data To label variables nominal scalesare utilized such as vehicle types (ie car train and bus) ordocument topics (ie science business and sports)

Ordinal Data In ordinal data the values have a natural orderIn this data type the order of values is significant such as dataobtained from the use of a Likert scale

Ordinal classificationwhich is proposed for the predictionof ordinal target values is one of the most important classi-fication problems in machine learning This paradigm aimsto predict the unknown class values of an attribute 119910 thathave a natural order In the ordinal classification problemthe ordinal dataset 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) hasa set of119898 items with input feature space119883 and class attribute119910 = 1198881 1198882 119888119896 has 119896 class labels with an order 119888119896 gt 119888119896minus1 gt gt 1198881 where gt denotes the relation of ordering In otherwords an example (x y) is composed of an input vector x 120598X and an ordinal label (ie rank) 119910 120598119884 = 1 2 119870 Theproblem is to predict the example (x y) as rank k where k =12 K

In a decision tree-based ordinal classification data istransformed from a k-class ordinal problem to kminus1 binaryclass problems that encode the ordering of the class labelsas the first step The ordinal dataset with classes 1198881 1198882 119888119896 is converted into binary datasets by discriminating 1198881 119888119894 against 119888119894+1 119888119896 that represents the test 119888119909 gt 119894 Inother words the upward unions of classes are consideredprogressively in each stage of binary datasets constructionThen a standard tree-based learning algorithm is employedon the derived binary datasets to construct k-1 models As aresult each model predicts the cumulative probability of aninstance of belonging to a certain class To predict the class

4 Journal of Advanced Transportation

label of an unseen instance the probability for each ordinalclass P(119888119909) is estimated by using k-1 models The estimationof the probability for the first ordinal class label depends ona single classifier 1-P(target gt 1198881) The probability of the lastordinal class is given by P(target gt 119888119896minus1) In the middle ofthe range the probability is calculated by a pair of classifiersP(target gt 119888119894minus1) - P(target gt 119888119894) where 1 lt i lt k Finally wechoose the class label with the highest probability

The core ideas of using decision tree algorithms forordinal classification problems can be described in threefolds

First Frank and Hall [7] reported that decision tree-based ordinal classification algorithm resulted in a significantimprovement over the standard version on 29 UCI bench-mark datasets They also experimentally confirmed that theperformance gap increases with the number of classes Mostimportantly their results showed that decision tree-basedordinal classification was able to generally produce a muchsimpler model with better ordinal prediction accuracy

Second to consider the inherent order of class labelsother standard classification algorithms such as KNN [20]SVM [21] and NN require modification (nontrivial changes)in the training methods Traditional regression techniquescan also be applied on ordinal valued data due to their abilityto classify interval or ratio quantity values However theirapplication to truly ordinal problems is necessarily ad hoc [7]In contrast the key feature of using decision tree for ordinalclassification is that there is noneed tomake anymodificationon the underlying learning algorithm The core idea is simplyto transform the ordinal classification problem into a seriesof binary-class problems

Third owing to the advantages of fast speed high preci-sion and ease of understanding decision tree algorithms arewidely used in classification as well as ordinal classificationOrdinal classification is sensitive to noise in data Even a fewnoisy samples exist in the ordinal dataset they can changethe classification results of the overall system However in thepresence of noisy andmissing data a pruned decision tree canreflect the order structure information very well with goodgeneralization ability The benefits of implementing decisiontree-based ordinal classification are not limited to these Itbuilds white box models so the trees constructed from thealgorithm can be visualized and the classification results canbe easily explained by Boolean logic In addition the mostimportant features which are near to the root node areemphasized via the construction of the tree for the ordinalprediction task

32 Tree-Based Classification Algorithms In this work threetree-based classification algorithms (C45 decision tree Ran-domTree and REPTree) are used as base learners to imple-ment ordinal classification process on transportation datasetsthat consists of ordered class values

321 C45 Decision Tree Decision tree is one of the mostsuccessful classification algorithms which predicts unknownclass attributes using a tree structure grown with depth-first strategy The structure of decision tree consists of

nodes branches and leaves that represent attributes attributevalues and class labels respectively In the literature there areseveral decision tree algorithms such as C45 ID3 (iterativedichotomiser) CART (classification and regression trees)and CHAID (chi-squared automatic interaction detector)In this study C45 decision tree algorithm was used dueto its popularity for the ordinal classification problem intransportation sector

The first step in C45 decision tree algorithm is to specifyroot of the tree The attribute which gives the most determi-nant information for the prediction process is selected for theroot node To determine the order of features in the decisiontree information gain formula is evaluated for each attributeas defined in

119866119886119894119899 (119878 119860) = 119864119899119905119903119900119901119910 (119878)

minus sumVisinV119886119897119906119890119904(119860)

10038161003816100381610038161198781198811003816100381610038161003816|119878| 119864119899119905119903119900119901119910 (119878119881)

(1)

where 119878V is subset of states 119878 for attribute 119860 with value VTheentropy indicates the impurity of a particular attribute in thedataset as defined in

119864119899119905119903119900119901119910 (119878) =119899

sum119894=1

minus 119901119894 log2 119901119894 (2)

where 119894 is a state 119901119894 is the possibility of outcome being instate 119894 for the set 119878 and 119899 is the number of possible outcomesThe attribute that has the maximum information gain value isselected for the root of the tree Likewise all attributes in thedataset are placed to the tree according to their informationgain values

322 RandomTree Assume a training set 119879 with 119911 attributesand m instances and RandomTree is an algorithm forconstructing a tree that considers 119909 randomly chosen features(xlez) with119898 instances at each node [24]While in a standarddecision tree each node is split using the best split amongall features in a random tree each node is split using thebest among the subset of attributes randomly chosen at thatnodeThe tree is built to its maximum depth in other wordsno pruning procedure is applied after the tree has been fullybuilt The algorithm can deal with both classification andregression problems In classification task the predicted classfor a sample is determined by traversing the tree from theroot node to a leaf according to the question posed about anindicator value at that node When a leaf node is reached itslabel determines the classification decision The RandomTreealgorithm does not need any accuracy estimation metricdifferent from the other classification algorithms

323 REPTree The reduced error pruning tree (REPTree)algorithm builds a decision regression tree using informa-tion gain variance and prunes it using a simple and fasttechnique [25]Thepruning technique starts from the bottomlevel of the tree (from leaf nodes) and replaces the nodewith most famous class This change is accepted only if theprediction accuracy is good By this way it helps to minimize

Journal of Advanced Transportation 5

the size of decision trees by removing sections of the treethat gives little capacity to classify instances The values ofnumeric attributes are sorted once Missing values are alsodealt with by splitting the corresponding instances into parts

33 Ensemble Learning Ensemble learning is a machinelearning technique that combines a set of individual learnersand predicts a final output [26] First each learner in theensemble structure is trained separately and multiple classi-fication models are constructed Then the obtained outputsfrom each model are compounded by a voting mechanismThe commonly used voting method available for categoricaltarget values is major class labelling and the voting methodsfor numerical target values are average weighted averagemedian minimum and maximum

Ensemble-learning approach constructs a strong classifierfrommultiple individual learners and aims to improve classi-fication performance by reducing the risk of an unfortunateselection of these learners Many ensemble-based studies [2728] proved that ensemble learners givemore successful resultsthan classical individual learners

In the literature the ensemble methods are generallycategorized under four techniques bagging boosting stack-ing and voting In one of our studies we implementedbagging approach by combining multiple neural networkswhich includes different parameter values for the predictiontask in textile sector [26] In the current research bagging andboosting (AdaBoost algorithm) methods with ordinal classclassifier were applied on transportation datasets For eachmethod (bagging and boosting) tree-based classificationalgorithms (C45 decision tree RandomTree and REPTreealgorithms) were used as base learners separately

331 Bagging Bagging (bootstrap aggregating) is a com-monly applied ensemble techniquewhich constitutes trainingsubsets by selecting random instances from original datasetusing bootstrap method Each classifier in the ensemblestructure is trained by different training sets and so multipleclassification models are produced Then a new sample isgiven to each model and these models predict an output Theobtained outputs are aggregated and a single final output isachieved

General Process of Bagging Let 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) be a set of 119898 items and let 119910119894 120598 119884 = 11988811198882 119888119896 be a set of 119896 class labels C denotes classificationalgorithm and 119899 is number of learners

(1) Draw 119898 items randomly with replacement from thedataset D so generate bootstrap samples 1198631 1198632 119863119899

(2) Each dataset119863119894 is trained by a base learner andmulti-ple classification models are constructed119872119894=C(119863119894)

(3) Consensus of classification models is tested to calcu-late out-of-bag error

(4) New sample 119909 is given to classifiers as input and theoutputs 119910119894 are obtained from each model 119910119894 = 119872119894(119909)

(5) The outputs of models 11987211198722 119872t are com-bined as in

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1 (3)

332 Boosting In boosting method classifiers are trainedconsecutively to convert weak learners to strong ones Aweight value is assigned to each instance in the training setThen in each iteration while the weights of misclassifiedsamples are increased correctly classified ones are decreasedIn this way the misclassified samplesrsquo chances of being in thetraining set increase In this study AdaBoost algorithm wasutilized to implement boosting method

AdaBoost AdaBoost also known as adaptive boosting is themost popular boosting algorithm which trains learners byreweighting instances in the training set iteratively Thenthe outputs produced by each learning model are aggregatedusing a weighted voting mechanism

4 Proposed Method Ensemble-Based OrdinalClassification (EBOC)

The proposed method named ensemble-based ordinal clas-sification (EBOC) combines ensemble-learning paradigmincluding its popular methods such as AdaBoost and baggingwith the traditional ordinal class classifier algorithm toimprove prediction performance The ordinal class classifieralgorithm which was proposed in [7] is used as a base learnerfor the ensemble structure of the proposed approach Inaddition tree-based algorithms such as C45 decision treeRandomTree and REPTree are also used as a classifier inordinal class classifier algorithm

The first step of the ordinal class classifier algorithmis to reduce multiclass ordinal classification problem to abinary classification problem To realize this approach theordinal classification problem with 119896 different class values isconverted to k-1 two-class classification problems Assumethat there is a dataset with four classes (k=4) here the taskis to find two sides (i) class C1 against classes C2 C3 and C4(ii) classes C1 and C2 against classes C3 and C4 and finally(iii) classes C1 C2 and C3 against class C4

For example suppose we have car evaluation dataset[29] which evaluates vehicles according to buying pricemaintenance price and technical characteristics such ascomfort number of doors person capacity luggage boot sizeand safety of the carThis dataset has an ordinal class attributewith four values unacc acc good and vgood First fourdifferent class values of the original dataset are converted tobinary values according to these rules Classes gt ldquounaccrdquoClasses gt ldquoaccrdquo and Classes gt ldquogoodrdquo class valued attribute Ifwe consider ldquoClasses gt lsquounaccrsquordquo rule class values higher thanldquounaccrdquo are labelled as 1 and the others are labelled as 0 In thisway three different transformed datasets that contain binaryclass values are obtained In the next stage a classificationalgorithm (ie C45 RandomTree or REPTree) is appliedon each obtained dataset separately Figure 2 shows the

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 3

extended examples with any binary classification algorithmand (iii) constructing a ranking rule from the binary classifier

As a second approach regression methods [12 13] canbe used to deal with ordinal classification problem sinceregression models have been thought for continuous dataIn this method categorical ordinal data is converted intocontinuous data scale and then regression algorithms areapplied on this transformed data as a postprocessing stepHowever a disadvantage of this approach is that the naturalorder of the class values is discarded and the inner structureof the samples is lost In [12] two most commonly usedordinal logistic models were applied onmedical ordinal dataproportional odds (PO) form of an ordinal logistic modeland the forward continuation ratio (CR) ordinal logisticmodel Rennie and Srebro [13] applied two general threshold-based constructions (the logistic and hinge loss) on the onemillion MovieLens dataset The experimental results statedthat their proposed approach shows more accurate resultsthan traditional classification and regression models

In the last approach problem-specific techniques [14ndash18]were developed for ordinal data classification by modifyingpresent classification algorithms The main advantage ofthis approach is to retain the order among the class labelsHowever someof thempresent some complexities in terms ofimplementation and training These are complex and requirenontrivial changes in the training methods such as modifi-cation of the objective function or using a threshold-basedmodel Keith and Meneses [14] proposed a novel techniquecalled Barycentric Coordinates for Ordinal Classification(BCOC) which uses barycentric coordinates to representordinal classes geometrically They applied their proposedmethod on the field of sentiment analysis and presentedeffective results for complex datasets Researchers in anotherstudy [17] presented a novel heuristic rule learning approachwith monotonicity constraints including two novel justifi-ability measures for ordinal classification The experimentswere performed to test the proposed approach and the resultsindicated that the novel method showed high predictionperformance by guaranteeing monotone classification withlow rule set increase

Under favour of its high prediction performanceensemble-learning techniques commenced to be preferred inordinal classification [19ndash22] as well as nominal classificationproblems Hechenbichler and Schliep [20] proposed anextended weighted k-nearest neighbor (wkNN) methodfor the ordinal class structure In their study weightedmajority vote mechanism was used for the aggregationprocess In the other study [21] an enhanced ensembleof support vector machines method was developed forordinal regression The proposed approach was implementedon the benchmark synthetic datasets and was comparedwith a kernel based ranking method in the experimentsLin [23] introduced a novel threshold ensemble modeland developed a reduction framework to reduce ordinalranking to weighted binary classification by extendingSVM and AdaBoost algorithms The results of all thesestudies show that the ensemble methods perform well on thedatasets and provide better performance than the individualmethods

Differently from existing studies our work proposes anovel ensemble-based ordinal classification approach includ-ing bagging and boosting (AdaBoost algorithm) methodsAlso the present study is the first study in which an ordinalclassification paradigm is implemented on real-world trans-portation datasets to model the behaviour of transportationsystems

3 Background Information

In this section background information about ordinal clas-sification classification algorithms and ensemble learning ispresented to provide the context for this research

31 Ordinal Classification In most of classification prob-lems target attribute values which will be predicted usuallyassumed that they have no ordering relation between themHowever the class labels of some datasets have inherentorder For example when predicting the price of an objectordered class labels such as ldquoexpensiverdquo ldquonormalrdquo andldquocheaprdquo can be used The order among these class labels isclearly understood and denoted as ldquoexpensiverdquo gt ldquonormalrdquogt ldquocheaprdquo Because of this reason classification techniquesdiffer according to nominal and ordinal data types

Nominal Data The data which have no quantitative valueis named nominal data To label variables nominal scalesare utilized such as vehicle types (ie car train and bus) ordocument topics (ie science business and sports)

Ordinal Data In ordinal data the values have a natural orderIn this data type the order of values is significant such as dataobtained from the use of a Likert scale

Ordinal classificationwhich is proposed for the predictionof ordinal target values is one of the most important classi-fication problems in machine learning This paradigm aimsto predict the unknown class values of an attribute 119910 thathave a natural order In the ordinal classification problemthe ordinal dataset 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) hasa set of119898 items with input feature space119883 and class attribute119910 = 1198881 1198882 119888119896 has 119896 class labels with an order 119888119896 gt 119888119896minus1 gt gt 1198881 where gt denotes the relation of ordering In otherwords an example (x y) is composed of an input vector x 120598X and an ordinal label (ie rank) 119910 120598119884 = 1 2 119870 Theproblem is to predict the example (x y) as rank k where k =12 K

In a decision tree-based ordinal classification data istransformed from a k-class ordinal problem to kminus1 binaryclass problems that encode the ordering of the class labelsas the first step The ordinal dataset with classes 1198881 1198882 119888119896 is converted into binary datasets by discriminating 1198881 119888119894 against 119888119894+1 119888119896 that represents the test 119888119909 gt 119894 Inother words the upward unions of classes are consideredprogressively in each stage of binary datasets constructionThen a standard tree-based learning algorithm is employedon the derived binary datasets to construct k-1 models As aresult each model predicts the cumulative probability of aninstance of belonging to a certain class To predict the class

4 Journal of Advanced Transportation

label of an unseen instance the probability for each ordinalclass P(119888119909) is estimated by using k-1 models The estimationof the probability for the first ordinal class label depends ona single classifier 1-P(target gt 1198881) The probability of the lastordinal class is given by P(target gt 119888119896minus1) In the middle ofthe range the probability is calculated by a pair of classifiersP(target gt 119888119894minus1) - P(target gt 119888119894) where 1 lt i lt k Finally wechoose the class label with the highest probability

The core ideas of using decision tree algorithms forordinal classification problems can be described in threefolds

First Frank and Hall [7] reported that decision tree-based ordinal classification algorithm resulted in a significantimprovement over the standard version on 29 UCI bench-mark datasets They also experimentally confirmed that theperformance gap increases with the number of classes Mostimportantly their results showed that decision tree-basedordinal classification was able to generally produce a muchsimpler model with better ordinal prediction accuracy

Second to consider the inherent order of class labelsother standard classification algorithms such as KNN [20]SVM [21] and NN require modification (nontrivial changes)in the training methods Traditional regression techniquescan also be applied on ordinal valued data due to their abilityto classify interval or ratio quantity values However theirapplication to truly ordinal problems is necessarily ad hoc [7]In contrast the key feature of using decision tree for ordinalclassification is that there is noneed tomake anymodificationon the underlying learning algorithm The core idea is simplyto transform the ordinal classification problem into a seriesof binary-class problems

Third owing to the advantages of fast speed high preci-sion and ease of understanding decision tree algorithms arewidely used in classification as well as ordinal classificationOrdinal classification is sensitive to noise in data Even a fewnoisy samples exist in the ordinal dataset they can changethe classification results of the overall system However in thepresence of noisy andmissing data a pruned decision tree canreflect the order structure information very well with goodgeneralization ability The benefits of implementing decisiontree-based ordinal classification are not limited to these Itbuilds white box models so the trees constructed from thealgorithm can be visualized and the classification results canbe easily explained by Boolean logic In addition the mostimportant features which are near to the root node areemphasized via the construction of the tree for the ordinalprediction task

32 Tree-Based Classification Algorithms In this work threetree-based classification algorithms (C45 decision tree Ran-domTree and REPTree) are used as base learners to imple-ment ordinal classification process on transportation datasetsthat consists of ordered class values

321 C45 Decision Tree Decision tree is one of the mostsuccessful classification algorithms which predicts unknownclass attributes using a tree structure grown with depth-first strategy The structure of decision tree consists of

nodes branches and leaves that represent attributes attributevalues and class labels respectively In the literature there areseveral decision tree algorithms such as C45 ID3 (iterativedichotomiser) CART (classification and regression trees)and CHAID (chi-squared automatic interaction detector)In this study C45 decision tree algorithm was used dueto its popularity for the ordinal classification problem intransportation sector

The first step in C45 decision tree algorithm is to specifyroot of the tree The attribute which gives the most determi-nant information for the prediction process is selected for theroot node To determine the order of features in the decisiontree information gain formula is evaluated for each attributeas defined in

119866119886119894119899 (119878 119860) = 119864119899119905119903119900119901119910 (119878)

minus sumVisinV119886119897119906119890119904(119860)

10038161003816100381610038161198781198811003816100381610038161003816|119878| 119864119899119905119903119900119901119910 (119878119881)

(1)

where 119878V is subset of states 119878 for attribute 119860 with value VTheentropy indicates the impurity of a particular attribute in thedataset as defined in

119864119899119905119903119900119901119910 (119878) =119899

sum119894=1

minus 119901119894 log2 119901119894 (2)

where 119894 is a state 119901119894 is the possibility of outcome being instate 119894 for the set 119878 and 119899 is the number of possible outcomesThe attribute that has the maximum information gain value isselected for the root of the tree Likewise all attributes in thedataset are placed to the tree according to their informationgain values

322 RandomTree Assume a training set 119879 with 119911 attributesand m instances and RandomTree is an algorithm forconstructing a tree that considers 119909 randomly chosen features(xlez) with119898 instances at each node [24]While in a standarddecision tree each node is split using the best split amongall features in a random tree each node is split using thebest among the subset of attributes randomly chosen at thatnodeThe tree is built to its maximum depth in other wordsno pruning procedure is applied after the tree has been fullybuilt The algorithm can deal with both classification andregression problems In classification task the predicted classfor a sample is determined by traversing the tree from theroot node to a leaf according to the question posed about anindicator value at that node When a leaf node is reached itslabel determines the classification decision The RandomTreealgorithm does not need any accuracy estimation metricdifferent from the other classification algorithms

323 REPTree The reduced error pruning tree (REPTree)algorithm builds a decision regression tree using informa-tion gain variance and prunes it using a simple and fasttechnique [25]Thepruning technique starts from the bottomlevel of the tree (from leaf nodes) and replaces the nodewith most famous class This change is accepted only if theprediction accuracy is good By this way it helps to minimize

Journal of Advanced Transportation 5

the size of decision trees by removing sections of the treethat gives little capacity to classify instances The values ofnumeric attributes are sorted once Missing values are alsodealt with by splitting the corresponding instances into parts

33 Ensemble Learning Ensemble learning is a machinelearning technique that combines a set of individual learnersand predicts a final output [26] First each learner in theensemble structure is trained separately and multiple classi-fication models are constructed Then the obtained outputsfrom each model are compounded by a voting mechanismThe commonly used voting method available for categoricaltarget values is major class labelling and the voting methodsfor numerical target values are average weighted averagemedian minimum and maximum

Ensemble-learning approach constructs a strong classifierfrommultiple individual learners and aims to improve classi-fication performance by reducing the risk of an unfortunateselection of these learners Many ensemble-based studies [2728] proved that ensemble learners givemore successful resultsthan classical individual learners

In the literature the ensemble methods are generallycategorized under four techniques bagging boosting stack-ing and voting In one of our studies we implementedbagging approach by combining multiple neural networkswhich includes different parameter values for the predictiontask in textile sector [26] In the current research bagging andboosting (AdaBoost algorithm) methods with ordinal classclassifier were applied on transportation datasets For eachmethod (bagging and boosting) tree-based classificationalgorithms (C45 decision tree RandomTree and REPTreealgorithms) were used as base learners separately

331 Bagging Bagging (bootstrap aggregating) is a com-monly applied ensemble techniquewhich constitutes trainingsubsets by selecting random instances from original datasetusing bootstrap method Each classifier in the ensemblestructure is trained by different training sets and so multipleclassification models are produced Then a new sample isgiven to each model and these models predict an output Theobtained outputs are aggregated and a single final output isachieved

General Process of Bagging Let 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) be a set of 119898 items and let 119910119894 120598 119884 = 11988811198882 119888119896 be a set of 119896 class labels C denotes classificationalgorithm and 119899 is number of learners

(1) Draw 119898 items randomly with replacement from thedataset D so generate bootstrap samples 1198631 1198632 119863119899

(2) Each dataset119863119894 is trained by a base learner andmulti-ple classification models are constructed119872119894=C(119863119894)

(3) Consensus of classification models is tested to calcu-late out-of-bag error

(4) New sample 119909 is given to classifiers as input and theoutputs 119910119894 are obtained from each model 119910119894 = 119872119894(119909)

(5) The outputs of models 11987211198722 119872t are com-bined as in

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1 (3)

332 Boosting In boosting method classifiers are trainedconsecutively to convert weak learners to strong ones Aweight value is assigned to each instance in the training setThen in each iteration while the weights of misclassifiedsamples are increased correctly classified ones are decreasedIn this way the misclassified samplesrsquo chances of being in thetraining set increase In this study AdaBoost algorithm wasutilized to implement boosting method

AdaBoost AdaBoost also known as adaptive boosting is themost popular boosting algorithm which trains learners byreweighting instances in the training set iteratively Thenthe outputs produced by each learning model are aggregatedusing a weighted voting mechanism

4 Proposed Method Ensemble-Based OrdinalClassification (EBOC)

The proposed method named ensemble-based ordinal clas-sification (EBOC) combines ensemble-learning paradigmincluding its popular methods such as AdaBoost and baggingwith the traditional ordinal class classifier algorithm toimprove prediction performance The ordinal class classifieralgorithm which was proposed in [7] is used as a base learnerfor the ensemble structure of the proposed approach Inaddition tree-based algorithms such as C45 decision treeRandomTree and REPTree are also used as a classifier inordinal class classifier algorithm

The first step of the ordinal class classifier algorithmis to reduce multiclass ordinal classification problem to abinary classification problem To realize this approach theordinal classification problem with 119896 different class values isconverted to k-1 two-class classification problems Assumethat there is a dataset with four classes (k=4) here the taskis to find two sides (i) class C1 against classes C2 C3 and C4(ii) classes C1 and C2 against classes C3 and C4 and finally(iii) classes C1 C2 and C3 against class C4

For example suppose we have car evaluation dataset[29] which evaluates vehicles according to buying pricemaintenance price and technical characteristics such ascomfort number of doors person capacity luggage boot sizeand safety of the carThis dataset has an ordinal class attributewith four values unacc acc good and vgood First fourdifferent class values of the original dataset are converted tobinary values according to these rules Classes gt ldquounaccrdquoClasses gt ldquoaccrdquo and Classes gt ldquogoodrdquo class valued attribute Ifwe consider ldquoClasses gt lsquounaccrsquordquo rule class values higher thanldquounaccrdquo are labelled as 1 and the others are labelled as 0 In thisway three different transformed datasets that contain binaryclass values are obtained In the next stage a classificationalgorithm (ie C45 RandomTree or REPTree) is appliedon each obtained dataset separately Figure 2 shows the

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

4 Journal of Advanced Transportation

label of an unseen instance the probability for each ordinalclass P(119888119909) is estimated by using k-1 models The estimationof the probability for the first ordinal class label depends ona single classifier 1-P(target gt 1198881) The probability of the lastordinal class is given by P(target gt 119888119896minus1) In the middle ofthe range the probability is calculated by a pair of classifiersP(target gt 119888119894minus1) - P(target gt 119888119894) where 1 lt i lt k Finally wechoose the class label with the highest probability

The core ideas of using decision tree algorithms forordinal classification problems can be described in threefolds

First Frank and Hall [7] reported that decision tree-based ordinal classification algorithm resulted in a significantimprovement over the standard version on 29 UCI bench-mark datasets They also experimentally confirmed that theperformance gap increases with the number of classes Mostimportantly their results showed that decision tree-basedordinal classification was able to generally produce a muchsimpler model with better ordinal prediction accuracy

Second to consider the inherent order of class labelsother standard classification algorithms such as KNN [20]SVM [21] and NN require modification (nontrivial changes)in the training methods Traditional regression techniquescan also be applied on ordinal valued data due to their abilityto classify interval or ratio quantity values However theirapplication to truly ordinal problems is necessarily ad hoc [7]In contrast the key feature of using decision tree for ordinalclassification is that there is noneed tomake anymodificationon the underlying learning algorithm The core idea is simplyto transform the ordinal classification problem into a seriesof binary-class problems

Third owing to the advantages of fast speed high preci-sion and ease of understanding decision tree algorithms arewidely used in classification as well as ordinal classificationOrdinal classification is sensitive to noise in data Even a fewnoisy samples exist in the ordinal dataset they can changethe classification results of the overall system However in thepresence of noisy andmissing data a pruned decision tree canreflect the order structure information very well with goodgeneralization ability The benefits of implementing decisiontree-based ordinal classification are not limited to these Itbuilds white box models so the trees constructed from thealgorithm can be visualized and the classification results canbe easily explained by Boolean logic In addition the mostimportant features which are near to the root node areemphasized via the construction of the tree for the ordinalprediction task

32 Tree-Based Classification Algorithms In this work threetree-based classification algorithms (C45 decision tree Ran-domTree and REPTree) are used as base learners to imple-ment ordinal classification process on transportation datasetsthat consists of ordered class values

321 C45 Decision Tree Decision tree is one of the mostsuccessful classification algorithms which predicts unknownclass attributes using a tree structure grown with depth-first strategy The structure of decision tree consists of

nodes branches and leaves that represent attributes attributevalues and class labels respectively In the literature there areseveral decision tree algorithms such as C45 ID3 (iterativedichotomiser) CART (classification and regression trees)and CHAID (chi-squared automatic interaction detector)In this study C45 decision tree algorithm was used dueto its popularity for the ordinal classification problem intransportation sector

The first step in C45 decision tree algorithm is to specifyroot of the tree The attribute which gives the most determi-nant information for the prediction process is selected for theroot node To determine the order of features in the decisiontree information gain formula is evaluated for each attributeas defined in

119866119886119894119899 (119878 119860) = 119864119899119905119903119900119901119910 (119878)

minus sumVisinV119886119897119906119890119904(119860)

10038161003816100381610038161198781198811003816100381610038161003816|119878| 119864119899119905119903119900119901119910 (119878119881)

(1)

where 119878V is subset of states 119878 for attribute 119860 with value VTheentropy indicates the impurity of a particular attribute in thedataset as defined in

119864119899119905119903119900119901119910 (119878) =119899

sum119894=1

minus 119901119894 log2 119901119894 (2)

where 119894 is a state 119901119894 is the possibility of outcome being instate 119894 for the set 119878 and 119899 is the number of possible outcomesThe attribute that has the maximum information gain value isselected for the root of the tree Likewise all attributes in thedataset are placed to the tree according to their informationgain values

322 RandomTree Assume a training set 119879 with 119911 attributesand m instances and RandomTree is an algorithm forconstructing a tree that considers 119909 randomly chosen features(xlez) with119898 instances at each node [24]While in a standarddecision tree each node is split using the best split amongall features in a random tree each node is split using thebest among the subset of attributes randomly chosen at thatnodeThe tree is built to its maximum depth in other wordsno pruning procedure is applied after the tree has been fullybuilt The algorithm can deal with both classification andregression problems In classification task the predicted classfor a sample is determined by traversing the tree from theroot node to a leaf according to the question posed about anindicator value at that node When a leaf node is reached itslabel determines the classification decision The RandomTreealgorithm does not need any accuracy estimation metricdifferent from the other classification algorithms

323 REPTree The reduced error pruning tree (REPTree)algorithm builds a decision regression tree using informa-tion gain variance and prunes it using a simple and fasttechnique [25]Thepruning technique starts from the bottomlevel of the tree (from leaf nodes) and replaces the nodewith most famous class This change is accepted only if theprediction accuracy is good By this way it helps to minimize

Journal of Advanced Transportation 5

the size of decision trees by removing sections of the treethat gives little capacity to classify instances The values ofnumeric attributes are sorted once Missing values are alsodealt with by splitting the corresponding instances into parts

33 Ensemble Learning Ensemble learning is a machinelearning technique that combines a set of individual learnersand predicts a final output [26] First each learner in theensemble structure is trained separately and multiple classi-fication models are constructed Then the obtained outputsfrom each model are compounded by a voting mechanismThe commonly used voting method available for categoricaltarget values is major class labelling and the voting methodsfor numerical target values are average weighted averagemedian minimum and maximum

Ensemble-learning approach constructs a strong classifierfrommultiple individual learners and aims to improve classi-fication performance by reducing the risk of an unfortunateselection of these learners Many ensemble-based studies [2728] proved that ensemble learners givemore successful resultsthan classical individual learners

In the literature the ensemble methods are generallycategorized under four techniques bagging boosting stack-ing and voting In one of our studies we implementedbagging approach by combining multiple neural networkswhich includes different parameter values for the predictiontask in textile sector [26] In the current research bagging andboosting (AdaBoost algorithm) methods with ordinal classclassifier were applied on transportation datasets For eachmethod (bagging and boosting) tree-based classificationalgorithms (C45 decision tree RandomTree and REPTreealgorithms) were used as base learners separately

331 Bagging Bagging (bootstrap aggregating) is a com-monly applied ensemble techniquewhich constitutes trainingsubsets by selecting random instances from original datasetusing bootstrap method Each classifier in the ensemblestructure is trained by different training sets and so multipleclassification models are produced Then a new sample isgiven to each model and these models predict an output Theobtained outputs are aggregated and a single final output isachieved

General Process of Bagging Let 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) be a set of 119898 items and let 119910119894 120598 119884 = 11988811198882 119888119896 be a set of 119896 class labels C denotes classificationalgorithm and 119899 is number of learners

(1) Draw 119898 items randomly with replacement from thedataset D so generate bootstrap samples 1198631 1198632 119863119899

(2) Each dataset119863119894 is trained by a base learner andmulti-ple classification models are constructed119872119894=C(119863119894)

(3) Consensus of classification models is tested to calcu-late out-of-bag error

(4) New sample 119909 is given to classifiers as input and theoutputs 119910119894 are obtained from each model 119910119894 = 119872119894(119909)

(5) The outputs of models 11987211198722 119872t are com-bined as in

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1 (3)

332 Boosting In boosting method classifiers are trainedconsecutively to convert weak learners to strong ones Aweight value is assigned to each instance in the training setThen in each iteration while the weights of misclassifiedsamples are increased correctly classified ones are decreasedIn this way the misclassified samplesrsquo chances of being in thetraining set increase In this study AdaBoost algorithm wasutilized to implement boosting method

AdaBoost AdaBoost also known as adaptive boosting is themost popular boosting algorithm which trains learners byreweighting instances in the training set iteratively Thenthe outputs produced by each learning model are aggregatedusing a weighted voting mechanism

4 Proposed Method Ensemble-Based OrdinalClassification (EBOC)

The proposed method named ensemble-based ordinal clas-sification (EBOC) combines ensemble-learning paradigmincluding its popular methods such as AdaBoost and baggingwith the traditional ordinal class classifier algorithm toimprove prediction performance The ordinal class classifieralgorithm which was proposed in [7] is used as a base learnerfor the ensemble structure of the proposed approach Inaddition tree-based algorithms such as C45 decision treeRandomTree and REPTree are also used as a classifier inordinal class classifier algorithm

The first step of the ordinal class classifier algorithmis to reduce multiclass ordinal classification problem to abinary classification problem To realize this approach theordinal classification problem with 119896 different class values isconverted to k-1 two-class classification problems Assumethat there is a dataset with four classes (k=4) here the taskis to find two sides (i) class C1 against classes C2 C3 and C4(ii) classes C1 and C2 against classes C3 and C4 and finally(iii) classes C1 C2 and C3 against class C4

For example suppose we have car evaluation dataset[29] which evaluates vehicles according to buying pricemaintenance price and technical characteristics such ascomfort number of doors person capacity luggage boot sizeand safety of the carThis dataset has an ordinal class attributewith four values unacc acc good and vgood First fourdifferent class values of the original dataset are converted tobinary values according to these rules Classes gt ldquounaccrdquoClasses gt ldquoaccrdquo and Classes gt ldquogoodrdquo class valued attribute Ifwe consider ldquoClasses gt lsquounaccrsquordquo rule class values higher thanldquounaccrdquo are labelled as 1 and the others are labelled as 0 In thisway three different transformed datasets that contain binaryclass values are obtained In the next stage a classificationalgorithm (ie C45 RandomTree or REPTree) is appliedon each obtained dataset separately Figure 2 shows the

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 5

the size of decision trees by removing sections of the treethat gives little capacity to classify instances The values ofnumeric attributes are sorted once Missing values are alsodealt with by splitting the corresponding instances into parts

33 Ensemble Learning Ensemble learning is a machinelearning technique that combines a set of individual learnersand predicts a final output [26] First each learner in theensemble structure is trained separately and multiple classi-fication models are constructed Then the obtained outputsfrom each model are compounded by a voting mechanismThe commonly used voting method available for categoricaltarget values is major class labelling and the voting methodsfor numerical target values are average weighted averagemedian minimum and maximum

Ensemble-learning approach constructs a strong classifierfrommultiple individual learners and aims to improve classi-fication performance by reducing the risk of an unfortunateselection of these learners Many ensemble-based studies [2728] proved that ensemble learners givemore successful resultsthan classical individual learners

In the literature the ensemble methods are generallycategorized under four techniques bagging boosting stack-ing and voting In one of our studies we implementedbagging approach by combining multiple neural networkswhich includes different parameter values for the predictiontask in textile sector [26] In the current research bagging andboosting (AdaBoost algorithm) methods with ordinal classclassifier were applied on transportation datasets For eachmethod (bagging and boosting) tree-based classificationalgorithms (C45 decision tree RandomTree and REPTreealgorithms) were used as base learners separately

331 Bagging Bagging (bootstrap aggregating) is a com-monly applied ensemble techniquewhich constitutes trainingsubsets by selecting random instances from original datasetusing bootstrap method Each classifier in the ensemblestructure is trained by different training sets and so multipleclassification models are produced Then a new sample isgiven to each model and these models predict an output Theobtained outputs are aggregated and a single final output isachieved

General Process of Bagging Let 119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898) be a set of 119898 items and let 119910119894 120598 119884 = 11988811198882 119888119896 be a set of 119896 class labels C denotes classificationalgorithm and 119899 is number of learners

(1) Draw 119898 items randomly with replacement from thedataset D so generate bootstrap samples 1198631 1198632 119863119899

(2) Each dataset119863119894 is trained by a base learner andmulti-ple classification models are constructed119872119894=C(119863119894)

(3) Consensus of classification models is tested to calcu-late out-of-bag error

(4) New sample 119909 is given to classifiers as input and theoutputs 119910119894 are obtained from each model 119910119894 = 119872119894(119909)

(5) The outputs of models 11987211198722 119872t are com-bined as in

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1 (3)

332 Boosting In boosting method classifiers are trainedconsecutively to convert weak learners to strong ones Aweight value is assigned to each instance in the training setThen in each iteration while the weights of misclassifiedsamples are increased correctly classified ones are decreasedIn this way the misclassified samplesrsquo chances of being in thetraining set increase In this study AdaBoost algorithm wasutilized to implement boosting method

AdaBoost AdaBoost also known as adaptive boosting is themost popular boosting algorithm which trains learners byreweighting instances in the training set iteratively Thenthe outputs produced by each learning model are aggregatedusing a weighted voting mechanism

4 Proposed Method Ensemble-Based OrdinalClassification (EBOC)

The proposed method named ensemble-based ordinal clas-sification (EBOC) combines ensemble-learning paradigmincluding its popular methods such as AdaBoost and baggingwith the traditional ordinal class classifier algorithm toimprove prediction performance The ordinal class classifieralgorithm which was proposed in [7] is used as a base learnerfor the ensemble structure of the proposed approach Inaddition tree-based algorithms such as C45 decision treeRandomTree and REPTree are also used as a classifier inordinal class classifier algorithm

The first step of the ordinal class classifier algorithmis to reduce multiclass ordinal classification problem to abinary classification problem To realize this approach theordinal classification problem with 119896 different class values isconverted to k-1 two-class classification problems Assumethat there is a dataset with four classes (k=4) here the taskis to find two sides (i) class C1 against classes C2 C3 and C4(ii) classes C1 and C2 against classes C3 and C4 and finally(iii) classes C1 C2 and C3 against class C4

For example suppose we have car evaluation dataset[29] which evaluates vehicles according to buying pricemaintenance price and technical characteristics such ascomfort number of doors person capacity luggage boot sizeand safety of the carThis dataset has an ordinal class attributewith four values unacc acc good and vgood First fourdifferent class values of the original dataset are converted tobinary values according to these rules Classes gt ldquounaccrdquoClasses gt ldquoaccrdquo and Classes gt ldquogoodrdquo class valued attribute Ifwe consider ldquoClasses gt lsquounaccrsquordquo rule class values higher thanldquounaccrdquo are labelled as 1 and the others are labelled as 0 In thisway three different transformed datasets that contain binaryclass values are obtained In the next stage a classificationalgorithm (ie C45 RandomTree or REPTree) is appliedon each obtained dataset separately Figure 2 shows the

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

6 Journal of Advanced Transportation

Classes gt lsquounaccrsquo Classes gt lsquoaccrsquo Classes gt lsquogoodrsquo

Orig

inal

Dat

aset

Dat

aset

1

P(classgtlsquounaccrsquo |sample) P(classgtlsquoaccrsquo |sample) P(classgtlsquogoodrsquo|sample)

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquolsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

Dat

aset

2

Dat

aset

3

lsquounaccrsquo lsquoaccrsquo lsquogoodrsquo lsquovgoodrsquo

FeaturesClassLabels

vhigh vhigh 2 2 hellip unacclow vhigh 2 4 hellip acclow vhigh 4 4 hellip accmed low 2 4 hellip goodlow med 2 4 hellip goodlow high 3 4 hellip vgood

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 1low vhigh 4 4 hellip 1med low 2 4 hellip 1low med 2 4 hellip 1low high 3 4 hellip 1

FeaturesClassLabels

vhigh vhigh 2 2 hellip 0low vhigh 2 4 hellip 0low vhigh 4 4 hellip 0med low 2 4 hellip 0low med 2 4 hellip 0low high 3 4 hellip 1

Figure 2 Application of ordinal class classifier algorithm on ldquocar evaluationrdquo dataset

demonstration of how the ordinal class classifier algorithmworks on the ldquocar evaluationrdquo dataset

When predicting a new sample the probabilities arecomputed for each 119896 class value using k-1 binary classificationmodels For example the probability of the ldquounaccrdquo classvalue 119875(10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) in the sample car dataset is evalu-ated by 1 minus 119875(119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890) Similarly the otherthree ordinal class value probabilities are computed Andfinally the class label which has the maximum probabilityvalue is assigned to the sample In general the probabilities ofthe class attribute values depend on ldquocar evaluationrdquo datasetwhich are calculated as follows

119875 (10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

= 1 minus 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198861198881198881015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401199061198991198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

119875 (10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

= 119875 (119888119897119886119904119904 gt 10158401198861198881198881015840 | 119904119886119898119901119897119890)

minus 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

119875 (1015840V1198921199001199001198891015840 | 119904119886119898119901119897119890) = 119875 (119888119897119886119904119904 gt 10158401198921199001199001198891015840 | 119904119886119898119901119897119890)

(4)

The proposed approach (EBOC) utilizes the ordinal classi-fication algorithm as base learner for bagging and boostingensemble methods This novel ensemble-based approachaims to obtain successful classification results under favourof high prediction ability of ensemble-learning paradigmThe general structure of the proposed approach is presentedin Figure 3 When bagging is considered random samplesare selected from the original dataset to produce multipletraining sets or when boosting is utilized samples areselected with specified probabilities based on their weightsAfter that the EBOC derives new datasets from the originaldataset each onewith new binary class attribute Each datasetis given to ordinal class classifier algorithm as an inputThen a classification algorithm (ie C45 RandomTree orREPTree) is applied on the datasets and multiple ensemble-based ordinal classification models are produced Each clas-sification model in this system gives an output label and themajority vote of these outputs is selected as a final class value

The pseudocode of the proposed approach (EBOC) ispresented in Algorithm 1 First multiple training sets aregenerated according to the preferred ensemble method (bag-ging or boosting) Second the algorithm converts ordinaldata to binary data After that a tree-based classificationalgorithm (ie C45 RandomTree or REPTree) is applied onthe binary training sets and by this way various classifiers areproduced After this training phase if boosting is chosen asensemble method the weight of each sample in the datasetis updated dynamically according to the performance (errorrate) of the classifiers in the current iteration To predictthe class label of a new input sample the probability foreach ordinal class is estimated by using the constructedmodelsThe algorithmchooses the class label with the highest

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 7

Predicted Class Label

Dataset

hellip

C

Dataset 1

Dataset 2

Dataset n

Binary Classifiers 1

Binary Classifiers 2

Binary Classifiers n

OrdinalClassifier 1

Ordinal Classifier 2

Ordinal Classifier n

Majority VotingOrdinal

Classifier 2

Ordinal Classifier n

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

unacc acc good vgood

C

CL1

CL2

CL3

C

CL1

CL2

CL3

C

CL1

CL2

CL3

Mlowast (x) = LAGRyisinY

n

sumiM(x)=y

1

Figure 3 The general structure of the proposed approach (EBOC)

probability Lastly the majority vote of the outputs obtainedfrom each ordinal classification model is selected as finaloutput

5 Experimental Studies

In the experimental studies the proposed approach (EBOC)was implemented on twelve different benchmark trans-portation datasets to demonstrate its advantages over thestandard nominal classification methods The applicationwas developed by using Weka open source data mininglibrary [30] Individual tree-based classification algorithms(C45 RandomTree and REPTree) ordinal classificationalgorithm [7] and the EBOC approach were applied on thetransportation datasets separately and they were comparedin terms of accuracy precision recall F-measure and ROCarea In the EBOC approach C45 RandomTree and REPTreealgorithms were also used as a base learner in ordinal classclassifiers separately The obtained experimental results inthis study are presented with the help of tables and graphs

51 Dataset Description In this study 12 different transporta-tion datasets which are available in several repositories forpublic use were selected to demonstrate the capabilities ofthe proposed EBOC method The datasets were obtainedfrom the following data archives UCI Machine LearningRepository [31] Kaggle [32] data mill north [33] and NYC

open data [34] The detailed descriptions about the datasets(ie what are they and how to use them) are given as follows

Auto MPG Auto MPG dataset which was used in theAmerican Statistical Association Exposition is presented bythe StatLib library which is maintained at Carnegie MellonUniversity This dataset is utilized for the prediction of thecity-cycle fuel consumption in miles per gallon of the carsaccording to their characteristics such as model year thenumber of cylinders horsepower engine size (displacement)weight and acceleration

Automobile The dataset was obtained from 1985 WardrsquosAutomotive Yearbook It consists of various automobilecharacteristics such as fuel type body style number of doorsengine size engine location horsepower length widthheight price and insurance score These features are usedfor the classification of automobiles by predicting their riskfactors six different risk ranking ranging from risky (+3) topretty safe (-3)

Bike Sharing The data includes the hourly count of rentalbikes between years 2011 and 2012 It was collected by Capitalbike share system fromWashingtonDC wheremembershiprental and bike return are automated via a network of kiosklocationsThe dataset is presented for the prediction of hourlybike rental counts based on the environmental and seasonalsettings such asweather situation (ie clear cloudy rainy and

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

8 Journal of Advanced Transportation

Algorithm EBOC Ensemble-Based Ordinal ClassificationInputsD the ordinal dataset119863 = (1199091 1199101) (1199092 1199102) (119909119898 119910119898)m the number of instances in the dataset DX input feature space an input vector x 120598 XY class attribute an ordinal class label 119910 120598 1198881 1198882 119888119896 120598 119884 with an order 119888119896 gt 119888119896minus1 gt gt 1198881k the number of class labelsn ensemble size

OutputsMlowast an ensemble classification modelMlowast(x) class label of a new sample x

Beginfor i = 1 to n do

if (bagging)119863119894 = bootstrap samples from D

else if (boosting)119863119894 =samples from D according to their weights

Construction of binary training sets119863119894119895for j = 1 to k-1 do

for s = 1 to m doif (119910119904 lt= 119888119895)119863119894119895Add(1199091199040)

else119863119894119895Add(1199091199041)

end forend for Construction of binary classifiers BCijfor j = 1 to k-1 do

BCij=ClassificationAlgorithm(119863119894119895)end forif (boosting)

update weight valuesend for Classification of a new sample xfor i = 1 to n do

Construction of ordinal classification models119872119894P(1198881) = 1 minus P(y gt 1198881)for j = 2 to k-1 do

P(119888119894) = P(y gt 119888119894minus1) minus P(y gt 119888119894)end forP(119888119896) = P(y gt 119888119896minus1)119872119894 = max(P)

end for Majority votingif (bagging)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

1else if (boosting)

119872lowast (119909) = arg max119910isin119884

119899

sum119894119872119894(119909)=119910

119908119890119894119892ℎ119905119894End Algorithm

Algorithm 1 The pseudocode of the proposed EBOC approach

snowy) temperature humidity wind speed month seasonand the day of the week and year

Car Evaluation This dataset is useful to evaluate the qualityof the cars according to various characteristics such as buying

price maintenance price number of doors person capacitysize of luggage boot and estimated safety of the car Thetarget attribute in the dataset indicates the overall scoresof the cars as unacceptable acceptable good and verygood

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 9

Car Sale Advertisements The data was collected from privatecar sale advertisements in Ukraine in 2016 It was proposedfor the prediction of sellerrsquos price (denominated in USD) inthe advertisement according to well-known car features suchas model body type mileage engine volume engine typeand drive type

NYS Air Passenger Traffic The data was collected monthly bythe Port Authority of New York State between years 1977 and2015 The aim is to predict the total number of domestic andinternational passengers for five airports (ACY EWR JFKLGA and SWF)

RoadTrafficAccidents (2017)Thedataset contains the recordsof traffic accidents across the City of Leeds UK reported in2017 with the information of location number of people andvehicles involved the type of vehicle road surface weathersituations lighting conditions and the age and sex of casualtyThe aim is to predict the severity of casualty as slight seriousor fatal

SF Air Traffic Landings Statistics and SF Air Traffic PassengerStatistics These two separate datasets include aircraft land-ings and passenger statistics of San Francisco InternationalAirport recorded during the period from July 2005 to March2018 These datasets are used to predict monthly landing andpassenger counts respectively

Smart City Traffic Patterns This dataset includes the trafficpatterns of the four junctions of a city collected between 2015and 2017The aim is tomanage the traffic of the city better andto provide input on infrastructure planning for the future toimprove the efficiency of services for the citizens To servethis purpose the number of vehicles in the traffic is predictedto implement a robust traffic system for the city by beingprepared for traffic peaks

Statlog (Vehicle Silhouettes) The Statlog dataset containsthe features of vehicle silhouettes extracted by HierarchicalImage Processing System (HIPS)The vehicle may be viewedfrom one of many different angles The aim is to classify agiven silhouette using a set of features extracted from theimage

Traffic Volume Counts (2012-2013) The dataset includesthe hourly traffic volume counts collected by New YorkMetropolitanTransportationCouncil between years 2012 and2013The traffic counts were measured on various roads fromone intersection to another with a specified direction (NBSB WB and EB) The attribute that will be predicted in thisdataset is traffic volume at 1100 - 1200 AM

Before the application of the nominal and ordinal classifi-cation algorithms generally the datasets have passed throughdata preprocessing steps In this study the ID attributesin the transportation datasets were eliminated in the datareduction step The date attributes were split into threetriplets day month and year because they provide moreuseful information for the prediction task In addition con-tinuous class values were discretized into 3 bins using equal

Entire Dataset

Step 1

Train Set

Test Set

Step 2

Step 10

helliphellip

Figure 4 The 10-fold cross-validation process

frequency technique since ordinal classification algorithmsrequire categorical ordinal data

Basic characteristics of the investigated transportationdatasets are given in Table 1 These are the number ofinstances attributes and classes target attribute data prepro-cessing operations and class distributions in each bin

52 Experimental Work In each experiment four methodswere compared on the transportation datasets (i) individ-ual tree-based classification algorithms (ii) ordinal classclassifier (iii) boosting-based ordinal classification and (iv)bagging-based ordinal classification They were compared byusing n-fold cross-validation technique selecting 119899 as 10 Inthis validation technique the entire dataset is divided into 119899equal size parts n-1 of them is chosen for training phase ofthe classification model and the last part is used in testingphase This process is repeated 119899 times with changing partsfor training and testing phase When the cross-validationprocess is terminated the accuracy values obtained in eachstep are averaged and a single accuracy value is produced asa success sign of the algorithm Figure 4 represents 10-foldcross-validation process

In this study alternative tree-based ordinal classificationalgorithms were compared according to accuracy precisionrecall F-measure and ROC area measures The metrics areexplained with their abbreviations formulas and definitionsin Table 2

53 Experimental Results In this study three different exper-iments with three different tree-based classification algo-rithms (C45 RandomTree and REPTree) were performedto compare the classification success of the proposed EBOCapproach with the existing individual algorithms and ordi-nal class classifier algorithm To evaluate the classificationperformances of the algorithms on a specific transportationdataset accuracy rate values were calculated using n-foldcross-validation technique selecting 119899 as 10

In each experiment one of the tree-based classificationalgorithms was used as base learners For example in the firstexperiment four methods were applied and compared on12 different transportation datasets (i) C45 the individual

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

10 Journal of Advanced Transportation

Table 1 The basic characteristics of the transportation datasets (I number of instances A number of attributes C number of classes)

Dataset I A C Class Distribution Target Attribute Data Preprocessing

Auto MPG 398 8 3 131-134-133 mpg Removing the columns withunique values (ie car name)

Automobile 205 26 7 0-3-22-67-54-32-27 symboling -

Bike Sharing 17379 13 3 5797-5783-5799 cntRemoving the columns ldquoinstantrdquo

ldquodtedayrdquo ldquocasualrdquo andldquoregisteredrdquo

Car Evaluation 1728 7 4 1210-384-69-65 class -

Car Sale Advertisements 9309 10 3 3112-3080-3117 price Removing rows with zero pricevalue

NYS Air Passenger Traffic 1584 4 3 528-528-528 total passengersRemoving the columns

ldquoDomesticrdquo and ldquoInternationalPassengersrdquo

Road Traffic Accidents(2017)

2203 13 3 1879-309-15 casualty severity(i) Removing columns that hold

reference numbers(ii) Splitting date column into

day and monthSF Air Traffic LandingsStatistics 21105 14 3 6941-7103-7061 landing count -

SF Air Traffic PassengerStatistics 18398 12 3 6132-6133-6133 passenger count -

Smart City Traffic Patterns 48120 6 3 15687-16436-15997 vehicles(i) Removing ID column

(ii) Splitting date column intoday month and year

Statlog (VehicleSilhouettes) 846 19 4 212-217-218-199 class -

Traffic Volume Counts(2012-2013) 5945 31 3 1983-1989-1973 traffic volume at 1100 -1200AM

(i) Removing the columns withunique values (ie ID Segment

ID)(ii) Splitting date column into

day month and year

Table 2The detailed information about classifier performance measures (TP true positives FP false positives TN true negatives and FNfalse negatives)

Abbreviation Measure Formula Description

Acc Accuracy119860119888119888 =119879119875 + 119879119873

119879119875 + 119879119873 + 119865119875 + 119865119873

The ratio of the number of correctlyclassified instances to the total number of

records

Prec Precision 119875119903119890119888 = 119879119875119879119875 + 119865119875

The ratio of correctly classified positiveinstances against all positive results

Rec Recall 119877119890119888 = 119879119875119879119875 + 119865119873

The ratio of correct answers for a classover all the answers correctly belonging

to this class

F F-measure 119865 = 2 lowast 119901119903119890119888119894119904119894119900119899 lowast 119903119890119888119886119897119897119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

The harmonic mean of precision andrecall

ROC Area Receiver OperatingCharacteristic Area

The area under the curve which is generated to compare correctly andincorrectly classified instances

decision tree algorithm (ii) OrdC45 ordinal class classifierwith C45 (iii) AdaOrdC45 AdaBoost based ordinalclassification with C45 as base learners and finally (iv)BagOrdC45 bagging based ordinal classification withC45 as base learners In the second and third experimentsRandomTree and REPTree algorithms were utilized asbase learners respectively The default parameters of the

algorithms in Weka were used in all experiments expectensemble size (the number of members) parameter whichwas set to 100 As a result of the experiments the accuracyrates of each algorithm were evaluated Tables 3 4 and 5show the comparative results of implemented techniques ontwelve benchmark datasets in terms of the accuracy ratesTheaccuracy rates which are higher than individual classification

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 11

Table 3 The comparison of C45 based ordinal and ensemble learning methods in terms of classification accuracy

Dataset C45 OrdC45 AdaOrdC45 BagOrdC45() () () ()

Auto MPG 8040 8090 lowast 8367lowast 8191 lowastAutomobile 8195 6634 8439 lowast 7366Bike Sharing 8775 8772 8929 lowast 8979 lowastCar Evaluation 9236 9219 9873lowast 9427 lowastCar Sale Advertisements 8365 8109 8382 lowast 8500 lowastNYS Air Passenger Traffic 8491 8542 lowast 8605 lowast 8681 lowastRoad Traffic Accidents (2017) 8529 8529 lowast 8103 8438SF Air Traffic Landings Statistics 9874 9831 9937lowast 9869SF Air Traffic Passenger Statistics 9068 9082 lowast 9041 9143 lowastSmart City Traffic Patterns 8566 8598 lowast 8507 8694 lowastStatlog (Vehicle Silhouettes) 7246 6891 7813lowast 7482 lowastTraffic Volume Counts (2012-2013) 8836 8812 8977lowast 8910 lowastAverage 8602 8426 8748 8640

Table 4 The comparison of RandomTree based ordinal and ensemble learning methods in terms of classification accuracy

DatasetRandom OrdRandom AdaOrd BagOrdTree Tree RandomTree RandomTree() () () ()

Auto MPG 7764 8166 lowast 8266 lowast 8141 lowastAutomobile 7659 7463 8341 lowast 8488 lowastBike Sharing 8007 8056 lowast 8698 lowast 8773 lowastCar Evaluation 8316 8594 lowast 9722 lowast 9583 lowastCar Sale Advertisements 8238 8212 8390 lowast 8634 lowastNYS Air Passenger Traffic 8611 8617 lowast 8561 8649 lowastRoad Traffic Accidents (2017) 7876 7844 8221 lowast 8448 lowastSF Air Traffic Landings Statistics 9738 9743 lowast 9893 lowast 9883 lowastSF Air Traffic Passenger Statistics 8922 8927 lowast 8905 8917Smart City Traffic Patterns 8226 8279 lowast 8472 lowast 8591 lowastStatlog (Vehicle Silhouettes) 7092 6560 7671 lowast 7600 lowastTraffic Volume Counts (2012-2013) 8296 8345 lowast 8977 lowast 8969 lowastAverage 8229 8234 8676 8723

algorithmrsquos accuracy rate in that row are marked with lowastIn addition the accuracy rates which have the maximumvalue in each row are made bold The experimentalresults show that the EBOC approaches (AdaOrdC45BagOrdC45 AdaOrdRandomTree BagOrdRandomTreeAdaOrdREPTree and BagOrdREPTree) generally pro-vide higher accuracy values than individual tree-basedclassification algorithms For example on 12 datasetsBagOrdRandomTree (8723) is significantly moreaccurate than RandomTree (8229) Similarly bothAdaOrdREPTree and BagOrdREPTree win against plainREPTree on 11 datasets and lose on only one The results alsoshow that the performance gap generally increases with thenumber of classes When the average accuracy values areconsidered in general it is possible to say that the proposedEBOC approach has the best accuracy score in all three

experiments AdaOrdC45 is 8748 BagOrdRandomTreeis 8723 and BagOrdREPTree is 8567

The matrices presented in Tables 6 7 and 8 give allpairwise combinations of the tree-based algorithms withordinal and ensemble-learning versions Each cell in thematrix represents the number of wins losses and tiesbetween the approach in that row and the approach inthat column For example in the pairwise of C45 andBagOrdC45 algorithms 2-10-0 indicates that standard C45algorithm is better than BagOrdC45 only on 2 datasetswhile BagOrdC45 is better than the other on 10 datasetsTheir accuracies are not equal in any dataset When allmatrices are examined it is clearly seen that our proposedapproach EBOC outperforms all other methods

The graphs given in Figures 5 6 and 7 show the averageranks of the tree-based algorithms (C45 RandomTree and

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

12 Journal of Advanced Transportation

Table 5 The comparison of REPTree based ordinal and ensemble learning methods in terms of classification accuracy

Dataset REPTree () OrdREPTree ()AdaOrd BagOrdREPTree REPTree

() ()Auto MPG 7538 7663 lowast 8191 lowast 8116 lowastAutomobile 6341 6683 lowast 8390 lowast 7220 lowastBike Sharing 8496 8527 lowast 8875 lowast 8812 lowastCar Evaluation 8767 9091 lowast 9867 lowast 9363 lowastCar Sale Advertisements 8070 8042 8252 lowast 8230 lowastNYS Air Passenger Traffic 8434 8491 lowast 8674 lowast 8731 lowastRoad Traffic Accidents (2017) 8466 8502 lowast 8007 8457SF Air Traffic Landings Statistics 9804 9810 lowast 9912 lowast 9857 lowastSF Air Traffic Passenger Statistics 9039 9049 lowast 9097 lowast 9125 lowastSmart City Traffic Patterns 8466 8506 lowast 8533 lowast 8638 lowastStatlog (Vehicle Silhouettes) 7234 6962 7754 lowast 7317 lowastTraffic Volume Counts (2012-2013) 8057 8875 lowast 8639 lowast 8937 lowastAverage 8226 8350 8683 8567

288329

2183

0

05

1

15

2

25

3

35

4Decision Tree

C45OrdC45AdaOrdC45BagOrdC45

Figure 5 The average ranks of C45 algorithm with ordinal andensemble-learning versions

REPTree) with ordinal and ensemble-learning (AdaBoostand bagging) versions respectively In the ranking methodeach approach used in this study is rated according to itsaccuracy score on the dataset This process is performed bystarting with giving rank 1 to the classifier with the highestclassification accuracy and continues to increase the rankvalue of the classifiers until assigning rank119898 to the worst oneof the119898 classifiers In the case of tie average of the classifiersrsquorankings is assigned to each classifier Then mean values ofthe ranks per classifiers on each datasets are computed asaverage ranks According to the comparative results EBOCapproach with bagging method has the best performanceamong the others in Figures 5 and 6 because it gives thelowest rank value

342

3

192167

0

05

1

15

2

25

3

35

4RandomTree

RandomTreeOrdRandomTreeAdaOrdRandomTreeBagOrdRandomTree

Figure 6The average ranks of RandomTree algorithm with ordinaland ensemble-learning versions

The proposed algorithms (AdaOrdlowast and BagOrdlowast)were applied on the transportation datasets and were com-pared with ordinal and nominal (standard) classificationalgorithms in terms of accuracy precision recall F-measureand ROC area metrics Additional measures besides accu-racy are required to evaluate all aspects of the classifiers andthey are useful for providing additional insight into theirperformance evaluations Table 9 shows the average resultsobtained from 12 transportation datasets at each experimentExcept accuracy the metrics listed in Table 9 give a degreeranging from 0 to 1 The algorithm with higher rate meansthat it is more successful than others Among C45 decision

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 13

Table 6 The pairwise combinations of the C45 algorithm with ordinal and ensemble learning versions

BagOrdC45 AdaOrdC45 OrdC45C45 3-9-0 3-9-0 7-4-1OrdC45 1-11-0 3-9-0AdaOrdC45 6-6-0BagOrdC45

Table 7 The pairwise combinations of the RandomTree algorithm with ordinal and ensemble learning versions

BagOrd AdaOrd OrdRandomTreeRandomTree RandomTree

RandomTree 1-11-0 2-10-0 4-8-0OrdRandomTree 2-10-0 2-10-0AdaOrdRandomTree 5-7-0BagOrdRandomTree

367

292

167 175

0

05

1

15

2

25

3

35

4REPTree

REPTreeOrdREPTreeAdaOrdREPTreeBagOrdREPTree

Figure 7The average ranks of REPTree algorithm with ordinal andensemble-learning versions

tree-based algorithms it is clearly seen that AdaOrdC45algorithm has the best accuracy (87467) precision (0874)recall (0875) and F-measure (0874) results While bag-ging has the highest values for the RandomTree algorithmboosting is the best among REPTree-based algorithms Whenthe ROC area is considered BagOrdlowast algorithms have thehighest scores according to the experimental results Asa result it is possible to say that ensemble-based ordinalclassification (EBOC) approach provides more accurate androbust classification results than traditional methods

54 Statistical Significance Tests The field of inferentialstatistics offers special procedures for testing the significanceof differences between multiple classifiers Although the

ensemble-based ordinal classification (EBOC) algorithms(AdaOrdlowast and BagOrdlowast) have lower ranking value weshould statistically verify that these algorithms are sig-nificantly different from the others For verification weused three well-known statistical methods [35] multiplecomparisons (Friedman test and Quade test) and pairwisecomparisons (Wilcoxon signed rank test)

The null hypothesis (H0) for a statistical test is one inwhich provided classification models are equivalent other-wise the alternative hypothesis (H1) is present when not allclassifiers are equivalent

H0 There are no performance differences among theclassifiers on the datasets

H1 There are performance differences among the classi-fiers on the datasets

The p-value is defined as the probability which is adecimal number between 0 and 1 and it can be expressedas a percentage (ie 01 = 10) With a small p-value wereject the null hypothesis (H0) so the relationship betweenthe classification results is significantly different

541 Statistical Tests for Comparing Multiple ClassifiersMultiple comparison tests (MCT) are used to establish astatistical comparison of the results reported among variousclassification algorithms We performed two well-knownMCT to determine whether the classifiers have significantdifferences or not Friedman test and Quade test

Friedman Test The Friedman test is a nonparametric statis-tical test that aims to detect significant differences betweenthe behaviors of two or more algorithms Initially it ranks thealgorithms for each dataset separately between 1 (smallest)and 4 (largest) In case of ties the average rank is assigned tothem After that the Friedman test statistics (1199092119903) is calculatedaccording to the following

1205942119903 =12

119889 lowast 119905 lowast (119905 + 1) lowast119905

sum119894=1

1198772119894 minus 3 lowast 119889 lowast (119905 + 1) (5)

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

14 Journal of Advanced Transportation

Table 8 The pairwise combinations of the REPTree algorithm with ordinal and ensemble learning versions

BagOrdREPTree AdaOrdREPTree OrdREPTreeREPTree 1-11-0 1-11-0 2-10-0OrdREPTree 1-11-0 2-10-0AdaOrdREPTree 7-5-0BagOrdREPTree

Table 9 The comparison of algorithms in terms of accuracy precision recall F-measure and ROC area

Compared Algorithms Accuracy () Precision Recall F-measure ROC AreaC45 8602 0861 0860 0866 0897OrdC45 8426 0843 0842 0848 0884AdaOrdC45 8748 0874 0875 0874 0933BagOrdC45 8640 0859 0864 0860 0947RandomTree 8229 0823 0823 0823 0860OrdRandomTree 8234 0825 0823 0824 0861AdaOrdRandomTree 8676 0866 0868 0866 0928BagOrdRandomTree 8723 0868 0872 0869 0947REPTree 8226 0817 0823 0817 0903OrdREPTree 8350 0831 0835 0829 0908AdaOrdREPTree 8683 0868 0868 0867 0925BagOrdREPTree 8567 0852 0857 0852 0939

where 119889 is the number of datasets t is the number ofclassifiers and 119877119894 is the total of the ranks for the 119894th classifieramong all datasets The Friedman test is approximately chi-square (1199092) distributed with t-1 degrees of freedom (df )and the null hypothesis is rejected if 1199092119903 gt 1199092119905minus1120572 in thecorresponding significance 120572

The obtained Friedman test value for the four C45-basedclassifiers is 10525 Since the obtained value is greater thanthe critical value (1199092119889119891=3120572=005 = 781) null hypothesis (H0)is rejected so it has been concluded that the four classifiersare significantly different The same situation is also valid forRandomTree-based and REPTree-based algorithms whichhave 153 and 201 test values respectively The obtained p-values for the C45 RandomTree and REPTree related EBOCresults (Tables 3 4 and 5) are 001459 000158 and 000016which are extremely lower than 005 level of significanceThus it is possible to say that the differences between theirperformances are unlikely to occur by chance

Quade Test Like the Friedman test the Quade test is anonparametric test which is used to prove that the differ-ences among classifiers are significant First the performanceresults of each classifier are ranked within each dataset toyield R119894119895 in the same way as the Friedman test does where119889 is the number of datasets and t is the number of classifiersfor 119894 = 1 2 119889 and 119895 = 1 2 119905 Then the range foreach dataset is calculated by finding the difference betweenthe largest and the smallest observations within that datasetThe obtained rank for dataset 119894 is denoted by119876119894Theweightedaverage adjusted rank for dataset 119894 with classifier 119895 is then

computed as S119894119895 = Qi lowast (R119894119895 minus (119905 + 1)2) The Quade teststatistic is then given by

119865 =(119889 minus 1) (1119889)sum119905119895=1 1198782119895

sum119889119894=1 sum119905119895=1 1198782119894119895 minus (1119889)sum119905119895=1 1198782119895(6)

where 119878119895 =sum119889119894=1 119878119894119895 is the sum of the weighted ranks for eachclassifier The 119865 value is tested against the F-distribution for agiven 120572 with df 1 = (k ndash 1) and df 2 = (b minus 1)(k minus 1) degreesof freedom If 119865 gt 119865(119896ndash1)(119887minus1)(119896minus1)120572 then null hypothesis isrejected Moreover the p-value could be computed throughnormal approximations

The p-values computed through the Quade statisticaltests for the C45 RandomTree and REPTree related EBOCalgorithms are 0013398 0000018 and 0000522 respectivelyTest results strongly suggest the existence of significantdifferences among the considered algorithms at the level ofsignificance 120572 = 005 Hence we reject the null hypothesiswhich indicates that all classification algorithms have thesame performance Thus it is possible to say that at least oneof them behaves differently

542 Statistical Tests for Comparing Paired Classifiers Inaddition to multiple comparison tests we also conductedpairwise comparison test to demonstrate that the proposedalgorithms (AdaOrdlowast and BagOrdlowast) have significant dif-ferences from others when individually compared Wilcoxonsigned rank test was applied to see pairwise performancedifferences

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 15

Table 10 Wilcoxon signed ranks test results

Compared Algorithms W+ Wminus W z-score p-value Significance LevelAdaOrdC45 vs C45 63 15 15 -1882715 0059739 moderateAdaOrdC45 vs OrdC45 65 13 13 -2039608 0041389 strongBagOrdC45 vs C45 61 17 17 -1725822 0084379 moderateBagOrdC45 vs OrdC45 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs RandomTree 75 3 3 -2824072 0004742 very strongAdaOrdRandomTree vs OrdRandomTree 75 3 3 -2824072 0004742 very strongBagOrdRandomTree vs RandomTree 77 1 1 -2980965 0002873 very strongBagOrdRandomTree vs vs OrdRandomTree 75 3 3 -2824072 0004742 very strongAdaOrd REPTree vs REPTree 71 7 7 -2510287 0012063 strongAdaOrd REPTree vs OrdREPTree 64 14 14 -1961161 0049860 strongBagOrdREPTree vs REPTree 77 1 1 -2980965 0002873 very strongBagOrdREPTree vs OrdREPTree 77 1 1 -2980965 0002873 very strongAverage 7125 675 675 -2602996 0022918

Wilcoxon Signed Rank Test This statistical test consists of thefollowing steps to check the validity of the null hypotheses(i) calculate performance differences between two algorithmsfor each dataset (ii) order the differences according to theirabsolute values from smallest to largest (iii) rank the absolutevalue of differences starting with the smallest as 1 andassigning average ranks in case of ties (iv) calculate W+

and W- by summing the ranks of the positive and negativedifferences separately (v) find 119882 as the smaller of the sumsW = min(W+ W-) (vi) calculate z-score as 119911 = 119882120590w andfind p-value from the distribution table and (vii) determinethe significance level according to the computed p-value [36]and reject the null hypothesis if p-value is less than a certainrisk threshold (ie 01 = 10)

119878119894119892119899119894119891119894119888119886119899119888119890 119897119890V119890119897

=

119901 gt 01 weak or none

005 lt 119901 le 01 moderate

001 lt 119901 le 005 strong

119901 le 001 very strong

(7)

Table 10 demonstrates the W z-sore and p-values com-puted through Wilcoxon signed rank tests Based on thereported p-values most of the tests strongly or very stronglyprove the existence of significant differences among theconsidered algorithms As the table states for each test theensemble-based ordered classification algorithms (BagOrdlowastand AdaOrdlowast) always show a significant improvement overthe traditional versions with a level of significance 120572=01In general it can be observed that the average p-value ofproposed BagOrdlowast variant algorithms (00171) is a little lessthan AdaOrdlowast variants (00288)

Based on all statistical tests (Friedman Quade andWilcoxon signed rank test) we can safely reject the nullhypothesis (ie there are no performance differences amongthe classifiers on the datasets) since the p-values computedthrough the statistics are less than the critical value (005)Thus the proposed approach (EBOC) has a statistically

significant effect on the response at the 950 confidencelevel

6 Conclusions and Future Works

As a result of developments in the field of transportationenormous amounts of raw data are generated every dayThis situation creates the potential to discover knowledgepatterns or rules from it and to model the behaviour of atransportation system using machine learning techniques Toserve the purpose this study focuses on the application ofordinal classification algorithms on real-world transportationdatasets It proposes a novel ensemble-based ordinal classifi-cation (EBOC) approachThis approach converts the originalordinal class problem into a series of binary class problemsand use ensemble-learning paradigm (boosting and bagging)at the same time To the best of our knowledge this is thefirst study in which ordinal classification methods and ourproposed approach were applied on transportation sectorIn the experimental studies the proposed model with thetree-based learners (C45 RandomTree and REPTree) wasimplemented on twelve benchmark transportation datasetsthat are available for public useThe proposed EBOCmethodwas compared with ordinal class classifier and traditionaltree-based classification algorithms in terms of accuracyprecision recall f-measure and ROC area The resultsindicate that the EBOC approach provides more accurateclassification results than them Moreover statistical testmethods (Friedman Quade andWilcoxon signed rank tests)were used to prove that the classification accuracies obtainedfrom the proposed algorithms (AdaOrdlowast and BagOrdlowast)are significantly different from the traditional methodsTherefore the proposed EBOC method can assist in makingright decisions in transportation Our findings demonstratethat the improvement in performance is a result of exploitingordering information and applying ensemble strategy at thesame time

As future work different types of ensemble-based ordinalclassifiers can be developed using different ensemblemethods

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 16: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

16 Journal of Advanced Transportation

such as stacking and voting and using different classificationalgorithms such as Naive Bayes k-nearest neighbour neuralnetwork and support vector machine In addition ensembleclustering models can be improved to cluster transportationdata Furthermore a comparative study which implementsensemble-learning and deep learning paradigms can beperformed in transportation fields

Data Availability

The ldquoAuto MPGrdquo ldquoAutomobilerdquo ldquoBike Sharingrdquo ldquoCar Eval-uationrdquo and ldquoStatlog (Vehicle Silhouettes)rdquo datasets usedto support the findings of this study are freely avail-able at httpsarchiveicsuciedumldatasets The ldquoCar SaleAdvertisementsrdquo ldquoNYS Air Passenger Trafficrdquo ldquoSmart CityTraffic Patternsrdquo ldquoSF Air Traffic Landings Statisticsrdquo andldquoSF Air Traffic Passenger Statisticsrdquo datasets used to sup-port the findings of this study are freely available athttpswwwkagglecomdatasets The ldquoRoad Traffic Acci-dents (2017)rdquo dataset used to support the findings of thisstudy is freely available at httpsdatamillnorthorgdatasetThe ldquoTraffic Volume Counts (2012-2013)rdquo dataset used tosupport the findings of this study is freely available athttpsopendatacityofnewyorkusdata

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

References

[1] Y Lv Y Duan W Kang Z Li and F-Y Wang ldquoTraffic flowprediction with big data a deep learning approachrdquo IEEETransactions on Intelligent Transportation Systems vol 16 no2 pp 865ndash873 2015

[2] X Wen L Shao Y Xue and W Fang ldquoA rapid learningalgorithm for vehicle classificationrdquo Information Sciences vol295 pp 395ndash406 2015

[3] S Ballı and E A Sagbas ldquoDiagnosis of transportation modeson mobile phone using logistic regression classificationrdquo IETSoftware vol 12 no 2 pp 142ndash151 2018

[4] H Nguyen C Cai and F Chen ldquoAutomatic classification oftraffic incidentrsquos severity using machine learning approachesrdquoIET Intelligent Transport Systems vol 11 no 10 pp 615ndash6232017

[5] G Kedar-Dongarkar and M Das ldquoDriver classification foroptimization of energy usage in a vehiclerdquo Procedia ComputerScience vol 8 pp 388ndash393 2012

[6] N Deepika and V V Sajith Variyar ldquoObstacle classification anddetection for vision based navigation for autonomous drivingrdquoin Proceedings of the 2017 International Conference on Advancesin Computing Communications and Informatics ICACCI 2017pp 2092ndash2097 Udupi India September 2017

[7] E Frank and M Hall ldquoA simple approach to ordinal classifi-cationrdquo in Proceedings of the European Conference on MachineLearning (ECML) vol 2167 of Lecture Notes in ComputerScience pp 145ndash156 Springer 2001

[8] E Alpaydin Introduction to Machine Learning The MIT Press3rd edition 2014

[9] S Destercke and G Yang ldquoCautious ordinal classification bybinary decompositionrdquo in Proceedings of the Machine Learningand Knowledge Discovery in Databases - European ConferenceECMLPKDD pp 323ndash337 Nancy France 2014

[10] J S Cardoso and J F Pinto da Costa ldquoLearning to classifyordinal data the data replication methodrdquo Journal of MachineLearning Research (JMLR) vol 8 pp 1393ndash1429 2007

[11] L Li andH T Lin ldquoOrdinal regression by extended binary clas-sificationrdquo in Proceedings of the 19th International Conferenceon Neural Information Processing Systems pp 865ndash872 Canada2006

[12] F E Harrell ldquoOrdinal logistic regressionrdquo in Regression Model-ing Strategies Springer Series in Statistics pp 311ndash325 SpringerInternational Publishing Cham Switzerland 2015

[13] J D Rennie andN Srebro ldquoLoss functions for preference levelsregression with discrete ordered labelsrdquo in Proceedings of theIJCAIMultidisciplinary Workshop Adv Preference Handling pp180ndash186 2005

[14] B Keith ldquoBarycentric coordinates for ordinal sentiment classifi-cationrdquo in Proceedings of the 23rd ACM SIGKDD Conference onKnowledgeDiscovery andDataMining (KDD) Halifax Canada2017

[15] E Fernandez J R Figueira J Navarro and B Roy ldquoELECTRETRI-nB A new multiple criteria ordinal classification methodrdquoEuropean Journal of Operational Research vol 263 no 1 pp214ndash224 2017

[16] M M Stenina M P Kuznetsov and V V Strijov ldquoOrdinal clas-sification using Pareto frontsrdquo Expert Systems with Applicationsvol 42 no 14 pp 5947ndash5953 2015

[17] W Verbeke D Martens and B Baesens ldquoRULEM A novelheuristic rule learning approach for ordinal classification withmonotonicity constraintsrdquo Applied Soft Computing vol 60 pp858ndash873 2017

[18] W Kotlowski and R Slowinski ldquoOn nonparametric ordinalclassificationwithmonotonicity constraintsrdquo IEEETransactionson Knowledge and Data Engineering vol 25 no 11 pp 2576ndash2589 2013

[19] K Dembczynski W Kotlowski and R Slowinski ldquoOrdinalclassification with decision rulesrdquo in Proceedings of the ThirdInternational Conference on Mining Complex Data (MCDrsquo07 )pp 169ndash181 Warsaw Poland 2007

[20] K Hechenbichler and K Schliep ldquoWeighted k-nearest-neighbor techniques and ordinal classificationrdquo CollaborativeResearch Center vol 399 pp 1ndash16 2004

[21] W Waegeman and L Boullart ldquoAn ensemble of weightedsupport vector machines for ordinal regressionrdquo InternationalJournal of Computer and Information Engineering vol 1 no 12pp 599ndash603 2007

[22] K Dembczynski W Kotlowski and R Slowinski ldquoEnsembleof decision rules for ordinal classification with monotonicityconstraintsrdquo in Proceedings of the International Conference onRough Sets andKnowledge Technology vol 5009 ofLectureNotesin Computer Science pp 260ndash267 2008

[23] H T Lin From Ordinal Ranking to Binary Classification [PhDthesis] California Institute of Technology 2008

[24] C K A Cuartas A J P Anzola and B G M TarazonaldquoClassification methodology of research topics based in deci-sion trees J48 andrandomtreerdquo International Journal of AppliedEngineering Research vol 10 no 8 pp 19413ndash19424 2015

[25] C Parimala and R Porkodi ldquoClassification algorithms in datamining a surveyrdquo in Proceedings of the International Journal ofScientific Research in Computer Science vol 3 pp 349ndash355 2018

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 17: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

Journal of Advanced Transportation 17

[26] P Yildirim D Birant and T Alpyildiz ldquoImproving predictionperformance using ensemble neural networks in textile sectorrdquoin Proceedings of the 2017 International Conference on ComputerScience and Engineering (UBMK) pp 639ndash644 Antalya TurkeyOctober 2017

[27] H Yu and J Ni ldquoAn improved ensemble learning methodfor classifying high-dimensional and imbalanced biomedicinedatardquo IEEE Transactions on Computational Biology and Bioin-formatics vol 11 no 4 pp 657ndash666 2014

[28] D Che Q Liu K Rasheed and X Tao ldquoDecision treeand ensemble learning algorithms with their applications inbioinformaticsrdquo in Software Tools and Algorithms for BiologicalSystems H R Arabnia and Q-N Tran Eds vol 696 pp 191ndash199 Springer 2011

[29] ldquoUCI Machine Learning Repository Car Evaluation DataSetrdquo Archiveicsuciedu 2018 httpsarchiveicsuciedumldatasetscar+evaluation

[30] ldquoWeka 3 - Data Mining with Open Source Machine Learn-ing Software in Javardquo Cswaikatoacnz 2018 httpswwwcswaikatoacnzmlweka

[31] ldquoUCI Machine Learning Repositoryrdquo Archiveicsuciedu 2018httpsarchiveicsuciedumlindexphp

[32] ldquoDatasets mdash Kagglerdquo Kagglecom 2018 httpswwwkagglecomdatasets

[33] ldquoData Mill Northrdquo Datamillnorthorg 2018 httpsdatamill-northorgdataset

[34] ldquoN City of New York ldquoNYC Open Datardquordquo Open-datacityofnewyorkus 2018 httpsopendatacityofnewyorkus

[35] J Derrac S Garcıa D Molina and F Herrera ldquoA practicaltutorial on the use of nonparametric statistical tests as amethodology for comparing evolutionary and swarm intelli-gence algorithmsrdquo Swarm and Evolutionary Computation vol1 no 1 pp 3ndash18 2011

[36] N A Weiss Introductory Statistics Addison-Wesley 9th edi-tion 2012

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 18: EBOC: Ensemble-Based Ordinal Classification in …downloads.hindawi.com/journals/jat/2019/7482138.pdfJournalofAdvancedTransportation the size of decision trees byremoving sections

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom