lower-bound estimate for cost-sensitive decision trees

18
Lower-Bound Estimate for Cost-sensitive Decision Trees Mikhail Goubko, senior researcher Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia 18 th IFAC World Congress, Milan, August 31, 2011

Upload: alika

Post on 09-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Lower-Bound Estimate for Cost-sensitive Decision Trees. Mikhail Goubko , senior researcher. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia. 18 th IFAC World Congress, Milan, August 31, 2011. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Lower-Bound Estimate for Cost-sensitive Decision TreesMikhail Goubko, senior researcherTrapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia

18th IFAC World Congress, Milan, August 31, 2011

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

from www.crengland.comIntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Decision tree a popular classification tool for:machine learningpattern recognitionfault detectionmedical diagnosticssituational controlDecision is made from a series of tests of attributesThe next attribute tested depends on the results of the previous tests Decision trees are learned from dataCompact trees are goodExpected length of the path in a tree is the most popular measure of tree size

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Set D of decisionschest coldinfluenzahealthySet of cases NSet of attributes MNoHighNoHighNoHighNoHighNoHighYesHighNoLowYesLowNoLowCought0Chronic illnessNoNoYesYesYesYesYesYesNoYesNoYesYesNoNoNoNoNoWheezingm1m2m3m4m5m6m7....mn

S U M M A R YMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Set D of decisionschest coldinfluenzahealthySet of cases NSet of attributes MNoNoNoNoNoYesNoYesNoHighHighHighHighHighHighLowLowLowCought0Chronic illnessNoNoYesYesYesYesYesYesNoYesNoYesYesNoNoNoNoNoWheezingm1m2m3m4m5m6m7....mnS U M M A R YMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Set D of decisionschest coldinfluenzahealthySet of cases NSet of attributes MNoNoNoNoNoYesNoYesNoYesNoYesYesNoNoNoNoNom1m2m3m4m5m6m7....mnNoNoYesYesYesYesYesYesNoHighHighHighHighHighHighLowLowLowS U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Set D of decisionschest coldinfluenzahealthySet of cases NSet of attributes MNoYesCought0Chronic illnessYesYesNoNoWheezingm1m2m3m4m5m6m7....mn

NoNoYesYesYesYesYesYesNoHighHighHighHighHighHighLowLowLowS U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Tests differ in measurement costs (e.g., a general blood analysis is much cheaper than a computer tomography procedure) The decision tree is grown to minimize the expected cost of classification given zero misclassification rate on a learning setOther types of costs relevant to decision tree growing (Turney, 2000), i.e. misclassification, teaching, intervention costs, are not consideredMeasurement costs (see Turney, 2000) may depend on:individual casetrue class of the caseside-effects (value of hidden attributes)prior testsprior results (other attributes values)correct answer of the current questionStudied!Covered!Can be accounted!

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

The decision tree growing problem is NP-hard and even hard to approximate Numerous heuristic algorithms were suggested of finding near-optimal trees (cost-sensitive algorithms are IDX, EG2, CS-ID3 and many others)These heuristics seem to perform well on real data but the question is always open - how much extra cost is due to imperfectness of an algorithm?Money losses may be substantial in some settingsAs optimal tree is hard to compute, its cost should be approximated from below.So we need a lower-bound estimate for the optimal decision tree costEstimates known from the literature have common limitations:too optimistic when the number of classes is smallattributes cardinality variations are not accountedBelow a new lower-bound estimate is suggested and studied

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Hyafil and Rivest (1976), and also Zantema and Bodlaender (2000) have shown the decision tree growing problem to be NP-hard Sieling (2008) has shown that the size of an optimal tree is hard to approximate up to any constant factorQuinlan (1979) developed an information-gain-based heuristics ID3. Now it is commercial and has version 5Heuristic algorithms for cost-sensitive tree construction:CS-ID3 - Tan (1993)IDX - Norton (1989)EG2 - Nez (1991)Lower-bound estimates for cost-sensitive decision trees:Entropy-based estimate - Ohta and Kanaya (1991)Huffman code length - Biasizzo, uek, and Novak (1998)Combinatory estimate - Bessiere, Hebrard, and OSullivan (2009)

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Definition 1. A subset of tests Q M isolates case w in subset of cases SN (w S) iff sequence of tests Q assures proper decision f(w) given initial uncertainty S and w is the real state of the world.

Definition 2. Optimal set of questionsQ(w,S) M is the cheapest of the sets of questions that isolate case w in set S;

Define also minimum cost The lower-bound estimateUnlike known estimates it performs well when:the number of classes is small compared to that of examplesthere is a small number of examples

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

The problem of unknown case classification is replaced by the problem of proving the true case to a third party.

What is the true case?Test thisI know the true case!

Try testing thisInitial problemSimplified problem

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Average time n2m, (n - the number of cases, m - the number of tests) Calculation of the estimate is reduced to a number of set-covering problems and is NP-hard in the worst case.But experiments show nice performance for the estimate and its linear programming relaxation.

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

1. Use the lower-bound estimate to evaluate extra costs due to imperfectness of heuristic tree growing algorithmsThe quality of the lower-bound estimate on real data sets

The quality of the estimate varies from 60 to 95% depending on the data setS U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Comparing new algorithms with known heuristics (IDX, CS-ID3, EG2) on different data sets (tree costs are shown)2. Use the lower-bound estimate to build new tree growing algorithmsnew algorithms perform better on small data setsnew algorithms work worse in the presence of dummy tests adjacency of results shows that heuristic trees are nearly optimal ones

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

General methods of hierarchy optimization

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Cost functionThe problem is to findAllowedarbitrary number of layersasymmetric hierarchiesmultiple subordinationseveral top nodes

Not allowedcyclesunconnected partssubordinating to the worker nodes (black)

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Sectional cost function

1. Covers most of applied problems of hierarchy optimization2. Analytical methodswhen the tallest or the flattest hierarchy is optimalwhen an optimal tree existswhen an optimal hierarchy has the shape of a conveyor3. Algorithmsoptimal hierarchy searchoptimal tree searchBuild an optimal conveyor-belt hierarchy4. The general problem of hierarchy optimization is complexHomogenous cost function

1. Also has numerous applications2. Closed-form solution for the optimal hierarchy problem3. Efficient algorithms of nearly-optimal tree construction

S U M M A R YNew lower-bound estimate is suggested for the decision tree with case-dependent test costs Unlike known estimates it performs well when the number of classes is smallEstimate calculation average performance is n2m operations for n examples and m testsUse the estimate to evaluate absolute losses of heuristic decision tree algorithmsUse the estimate in split criteria of greedy top-down algorithms of decision tree constructionExperiments on real data sets show these algorithms to give comparable results with popular cost-sensitive heuristics IDX, CS-ID3, EG2Algorithms suggested perform better on small data sets with lack of testsMikhail GoubkoLower-Bound Estimate for Cost-sensitive Decision TreesIFAC2011, Milan, August 31

IntroductionDecision treesCostsResultsMotivationLiteratureDefinitionsCalculationApplicationsBonus: HierarchyOptimizationModel

Methods

Apps

Applications of hierarchy optimization methodsManufacturing planning (assembly line balancing)Networks designcommunication and computing networksdata collection networksstructure of hierarchical cellular networksComputational mathematicsoptimal codingstructure of algorithmsreal-time computation and aggregation hierarchical parallel computingUser interfaces design optimizing hierarchical menusbuilding compact and informative taxonomies Data miningdecision trees growingstructuring database indexes Organization designorg. chart re-engineeringtheoretical models of a hierarchical firm

2011Blues14330.8692011Blues66733.22011Blues65021.664Blues88330.432011Blues5694.6987Blues15412.507Blues24301.787Set W of feasible hierarchies

...

2011Blues73399.25Blues41168.74