an interactive machine learning framework · 2016. 10. 19. · our project. however, decision tree...

An Interactive Machine Learning Framework

Teng LeeUniversity at Buffalo

James JohnsonUniversity at Buffalo

Steve ChengUniversity of California,

Los Angeles

ABSTRACTMachine learning (ML) is believed to be an effective and ef-ficient tool to build reliable prediction model or extract use-ful structure from an avalanche of data. However, ML is alsocriticized by its difficulty in interpretation and complicatedparameter tuning. In contrast, visualization is able to well or-ganize and visually encode the entangled information in dataand guild audiences to simpler perceptual inferences and an-alytic thinking. But large scale and high dimensional datawill usually lead to the failure of many visualization meth-ods. In this paper, we close a loop between ML and visual-ization via interaction between ML algorithm and users, somachine intelligence and human intelligence can cooperateand improve each other in a mutually rewarding way. In par-ticular, we propose ”transparent boosting tree (TBT)”, whichvisualizes both the model structure and prediction statisticsof each step in the learning process of gradient boosting treeto user, and involves user’s feedback operations to trees intothe learning process. In TBT, ML is in charge of updatingweights in learning model and filtering information shownto user from the big data, while visualization is in charge ofproviding a visual understanding of ML model to facilitateuser exploration. It combines the advantages of both ML inbig data statistics and human in decision making based ondomain knowledge. We develop a user friendly interface forthis novel learning method, and apply it to two datasets col-lected from real applications. Our study shows that makingML transparent by using interactive visualization can signif-icantly improve the exploration of ML algorithms, give riseto novel insights of ML models, and integrates both machineand human intelligence.

Author KeywordsEnsemble boosting tree, decision tree, interactive visualiza-tion, interpretable machine learning, transparent machine learn-ing

ACM Classification KeywordsH.5.2 Information Interfaces and Presentation: Miscellaneous

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CHI 2009, April 4 - 9, 2009, Boston, Massachusetts, USA.Copyright 2009 ACM 978-1-60558-246-7/09/04...$5.00.

INTRODUCTIONMachine learning (ML) [3] plays an increasingly importantrole in data science today with great promises to improvingour life. By fitting data to a flexible discriminative or gen-erative model leveraging the relationship between variablesand latent structures, ML is able to learn a prediction modelor structured information that is consistent to the given datawith statistical importance. The obtained prediction modelor data structure can be applied to analysis and classificationof new data instances. In this era, the avalanche of informa-tion gives rise to a growing interests in big data. While datacollection becomes cheap, various current ML algorithmsare designed for processing these large-scale and high di-mensional data with noise with promising efficiency and ac-curacy.

However, non-experts often treat ML as a fancy “black box”with few convincing interpretation of the model structureand learning process. In addition, even for ML researchers,how does the ML mechanism well-defined on statistical as-sumptions work on practical data, and how is the ML modelupdated in each iteration according to input features, arenot transparent and usually ignored. The understanding andfurther exploration of ML models are highly limited in thiscase. Especially when applied to specific applications, it isessential to study when and why the ML algorithms workbetter or worse than expected, and then adjust the model ac-cordingly. Therefore, a transparent ML tool that can clearlyshow the learning process to users can significantly facilitatethe exploration of ML models.

Moreover, parameter tuning and determination of model com-plexity are two tricky problems that are critical to the perfor-mance of ML algorithms. Although there exists some sta-tistical methods for solving them, it turns out that humanintelligence based on domain knowledge always performsbetter and is more reliable on these decision-making prob-lems, e.g., determining the depth of a decision tree, featureselection, and reasoning of performance enhancement. Inaddition, the emergence of human intelligence often offerbetter interpretation than machine intelligence. However, inthe previous ML systems, it is also hard to involve users wis-dom into learning process to improve both the generalizationability and interpretability of the ML models.

In contrast, visualization [8] is a popular and powerful tool toexplore data by concentrating human intelligence on visuallyencoded informations extracted from data. It shows differenttypes of features of data in separate and interpretable ways,

1

arX

iv:1

610.

0546

3v1

[cs

.HC

] 1

8 O

ct 2

016

Visualization!Model&View&+&Data&View&

Interaction!User&Opera3on&on&Model&

age<20?�

Yes No

root:SIGKDD

0�Major=CS?�

+1� 0.5�

Yes No

Has5User5Tag5Data5Mining?�

+2� 0�

Yes No

root:SIGKDD

is5female?�

age<518?�

0�

+1� 0.2�

Yes No

Yes No

root:Barbie

is5female?�

age<514?�

0�

+1�

Yes No

Yes No

root:Barbie

0�

Ensemble Boosting Tree Structure (nodes, paths, trees) + Weights (per sample, per tree)

Figure 1. Interactive learning loop between machine learning and visualization by integrating both machine intelligence and human intelligence.

and also takes the relationship between features and samplesinto account. The visual representation of data is expected tohelp, facilitate and improve the comprehension of audiences,and thus stimulate more effective analytic thinking and de-cision making. However, visualization usually suffers fromhigh dimensions and large volume of data. In this situation,the audiences suffer from an information overload caused bythe big data, and the cognitive calculation could be slowed.

Main IdeaIn this paper, we are going to close a loop between machinelearning and visualization so they can benefit each other indata analysis and complete an interactive learning systemthat makes ML transparent to users. An overview of this sys-tem is given in Figure 1. In each iteration of the loop, MLalgorithm updates the prediction model according to givendata and user feedback, then visualization provides a visualencoding to the model structures and the prediction resultson training data by using the current model. In the visualiza-tion interface, user is able to track both the model structureand the prediction of training data, and operate on the modelto change its structures based on domain knowledge. Theseuser operations will be the input of the next run of the MLalgorithm. ML algorithm will update the model by treatinguser’s feedback as constraint in its learning. In this loop, MLis in charge of updating model, choosing useful informationto visualize, and processing user’s feedback, while user is incharge of modification to the updated model.

In particular, we choose gradient boosting tree (GBT) [2]as our ML algorithm in the loop because of its appealingperformance in lots of practical problems. GBT is an en-semble of decision trees that are established in a forward

stagewise fashion similar to boosting. It learns an additivemodel defining decision rules for effective prediction frominput features, such as classification and regression. GBThas been well known as the state-of-the-art method for manychallenging real world problems. Each step of GBT learn-ing can be treated as learning a decision tree based on all theprevious trees. Each node of each decision tree correspondsto a selected feature and a decision rule.

In the interactive learning system called “transparent boost-ing tree (TBT)”, visualization interface provides four viewsof model and data, i.e., feature view listing the feature groupsand highlighting the selected features, forest view listing allprevious trees, tree view showing the detailed structure ofthe current tree, path purity view capturing the prediction re-sults of training set on each node along a selected path onthe tree, and history view tracking both the training errorand test error of each update in the learning process. In eachstep, users are accessible to these views to evaluate the cur-rent model, and our system allows the users to add/removea tree, select feature group for building a new tree, removea node on the current tree, expand a leaf node, and go backto any previous model in the learning loop. Then GBT algo-rithm will take these user operations as inputs to update theensemble of decision trees accordingly.

In this paper, we implement TBT by building an interactivevisualization system, and apply it to two practical datasets,mushroom and fusion health. The former aims at justifyingif a given mushroom is poisonous or not, while the latteraims at predicting the healthy status of a patient. TBT isdemonstrated to be a promising tool for exploration of GBT,visualization of ML model, and human-ML interaction.

2

RELATED WORKSThere are several, though not many, early efforts on improv-ing the interpretation of ML models by using effective visu-alization [1]. The visualization is critical in many systems,e.g. health [5, 6], medical [4], and data mining [9].

EnsembleMatrix [7] enables users to choose any combina-tion of multiple classifiers to build a combination model,after showing users the confusion matrix of all classifiers.Thus it provides an interaction visualization tool emerginguser intelligence in ML process. However, both the train-ing and mode structure of each classifier are still invisible tousers.

Some recent works like BigML1 focuses on the visualizationof decision tree, which is related to ensemble boosting tree inour project. However, decision tree is less powerful and alsosimpler than ensemble boosting tree. In addition, they com-pute all possible paths before showing them to users ratherthan instantaneously update them, and allow fewer opera-tions of users on the tree. So the loop has not been reallybuilt, the benefits of interaction is limited, and the computa-tion could be expensive.

TRANSPARENT BOOSTING TREE

Learning Algorithm for Gradient Boosting TreeOne goal of TBT is to visualize the learning process and themodel structure of GBT. GBT builds additive model F (x)composed of M decision trees hm(x) with input features x,i.e.,

F (x) =

M∑m=1

γmhm(x). (1)

Each decision tree hm(x) is a weak learner that can handledata of mixed type and to model complex nonlinear func-tion. Similar to boosting, GBT builds the additive model ina forward stagewise fashion such that

Fm(x) = Fm−1(x) + γmhm(x). (2)

In each stage, fixing the previous ensemble of treesFm−1(x),the current decision tree hm(x) is chosen to minimize theloss function L.

hm(x) = argminh

n∑i=1

L(yi, Fm−1(xi)− h(xi)). (3)

This minimization problem can be solved by using a steepestgradient descent, which is

Fm(x) = Fm−1(x) + γm

n∑i=1

∇L(yi, Fm−1(xi)), (4)

where the step size γm is determined by line search. The fi-nal additive model is usually represented by the combinationof all paths on all the trees with different weight per path.

In learning of each decision tree, the gradient∇L(yi, Fm−1(xi))is firstly assigned to each training sample as its importance1https://bigml.com

weight in learning the current tree. Then the nodes and theircorresponding decision rules defining the tree are selected ina greedy manner from root to leaves. Specifically, for eachnode, learning algorithm chooses the best threshold for eachavailable feature by evaluating all possible thresholds. Theselected threshold defines the best decision rule associatedto each feature. By using both the importance weights andHessian matrix, we can thus compute the information gaincaused by applying each feature and its decision rule on thenode. The pair which results in the maximal informationgain is selected to define the current node.

In TBT, we involve the user operations into the learning pro-cess of GBT. In particular, TBT allows users to operate onthe features and trees that used to be selected by using greedymethod, and to determine the number of trees and the depthof a path. Thus the user feedback can be treated as a humanregularization or constraint to limit the model complexityof GBT, and to correct the possible mistakes made by puregreedy selection. This is expected to be effective in avoidingoverfitting, simplifying model, and improving intepretabil-ity. Because human with domain knowledge is believed tobe good at those decision making tasks.

We let the GBT learning algorithm in charge of the updateof thresholds and importance weights, because computer isgood at searching and optimizing these numerical variables.Therefore, the advantages of both human intelligence andmachine intelligence can be combined in TBT.

Design of Interactive Visualization InterfaceThe visualization interface of TBT is composed of one ini-tialization tool bar and five views, i.e., feature view listingthe feature groups and highlighting the enabled features, for-est view listing all previous trees, tree view showing the de-tailed structure of current tree, path purity view capturingthe prediction results of training set on each node along aselected path on the tree, and history view tracking both thetraining error and test error of each update in the learningprocess.

In the initialization tool bar, user is able to choose the datasetto analyze, the method to group features in the feature view,the initial number of trees in GBT, the initial maximal depthof each tree. Clicking the “REBUILD!” button will providean initial GBT model, which is learned by original GBT al-gorithm and visualized in the five views.

In the following tables, we list the major visually encodedinformation and interaction/user operation in each view.

Table 1. Visual encoding and user operation in feature view.Visual Encoding/ExplorationFeature group (text)Feature name (text)No. of features per group (integer)User Operation/Interactionallow the feature (right click feature & select)block the feature (right click feature & select)

3

Blocked feature

Unblocked feature

Feature + Decision Rule

Leaf node + Path weight

Highlighted Path

Figure 2. Visual encoding and exploration in five views of visualization interface of transparent boosting tree.

Table 2. Visual encoding and user operation in forest view.Visual Encoding/ExplorationTree index (integer)Feature and decision rule for root node (text)User Operation/InteractionCollapse the whole tree (left click)

Table 3. Visual encoding and user operation in tree view.Visual Encoding/ExplorationFeature (text)Decision rule (text)Path weight on leaf node (ratio)Major class on leaf node (color)No. of samples on each edge (width)Highlight a path (brushing a path)Link a path to its prediction bar chartsUser Operation/InteractionRemove the current tree (right click root node & select)Grow a new tree (right click root node & select)Remove the node on the current tree (right click node & select)Remove the node on all trees (right click node & select)Expand the leaf node on the current tree (right click node & select)Expand the leaf node on all trees (right click node & select)

On one hand, the “forest view” and “tree view” visualizethe structure of GBT model in learning process, i.e., multi-ple decision trees from root node to current leaf nodes, theweight of each path, and the feature with associated deci-sion rule on each node. These two views provide a detailedand transparent visualization of GBT model structure to user.They lead to a clear explanation of the current model, e.g.,what features were selected to build decision rules, and howthese features cooperate with each other. So users can jus-tify which nodes on the current model are consistent withdomain knowledge or common sense, and which are not.

Table 4. Visual encoding in path purity view.Visual Encoding/ExplorationPrediction statistics on a path (linked to brushed path)Negative and positive counts on nodes (bar charts)Decision rule (text)

Table 5. Visual encoding and user operation in history view.

Visual Encoding/ExplorationHistorical operation (text)Training error (ratio)Test error (ratio)User Operation/InteractionGo back to former model (right click row & select restore)

On the other hand, “path purity view” and “history view” vi-sualize the prediction results on each node along the brushedpath in the tree view, and the overall historical training/testerror after each user operation, respectively. These two viewslet users track the performance changes caused by the fea-ture(s) and the associated decision rule(s) on each node, eachpat, each tree, and each user operation in the history. Evenwithout sufficient domain knowledge, users can still find im-portant clues from the prediction results of the current GBTmodel to evaluate the performance of selected features, de-cision rules, established trees and paths.

By showing users both the model structure and performanceon training data, we expect to collect the wisdom of usersfrom both their domain knowledge and judgement based onthe prediction result on data via their feedback. In our vi-sualization tool, users have complete freedom to modify thestructure of each tree, e.g., to choose feature for each node,

4

to prune an existing node/path, and to add a new node/path.The model will be updated according to these instantaneousfeedbacks from users, and we will show the new views forthe updated model.

In the learning process, ML algorithm is in charge of learn-ing candidates of model, estimating the numerical weightsof the model, and updating model according to user feed-back, while users are in charge of modifying the ML modelby their domain knowledge and feedback from the currentlearning results. Interactive visualization bridges these twocomponents by conveying the feedback of one to the other.Therefore, we can close a loop between these two so that onehelps enhance the performance of the other in a mutually re-warding way, until we achieve a promising ML model.

Exploration of GBT model in TBTOne primary contribution of this paper is that TBT enablesfull exploration of complex machine learning model by user.We show the visual encoding/exploration contained in TBTin Figure 2. In particular, although TBT allows to learn hun-dreds of trees in a second, the detailed structure of each de-cision tree can be collapsed by an animation and shown inthe tree view when clicking the tree index listed in the forestview. Thus user is able to evaluate each tree by going to itsdetails quickly.

In the tree view, user is able to explore the feature and cor-responding decision rule on each node, as well as the com-posite decision rule defined by each path, and the numberof training samples associated with each edge of each path.So the features and decision rules can be evaluated by users’domain knowledge. In addition, by brushing arbitrary pathon the tree, the prediction statistics on each node along theselected path can be displayed in bar charts within the pathpurity view. By investigating the prediction statistics, it isvery convenient for user to determine the best depth of thepath that can avoid possible overfitting, and do manual fea-ture selection according to the contribution of each featurein the prediction task.

Therefore, users is able to gain a comprehensive and course-to-fine understanding of the ML model via the visualizationand exploration of both the model structure and predictionresults on data. This is essentially helpful to generating feed-back of high quality after analytic thinking and cognitive in-ference in human brain.

Interaction and User Operation in TBTAnother key insight of this paper is that TBT allows richuser operations on the ML model and is able to take thoseuser feedback as input of the learning algorithm in the nextmodel update. As shown in Figure 3, users are allowed tooperate on different scales of GBT model.

Specifically, on the forest scale, they can remove/add arbi-trary number of trees to the model. In addition, in any stageof learning process, user can define a subset of features by“allow/block features” in the feature view, such that all anyfeature selected in the rest trees are confined in the subset.

On the tree scale, users can remove any node or expand anyleaf node. This results in feature selection. It worth not-ing that TBT supports a very powerful and useful functioncalled “remove/expand this node for all”, which is able toremove/add the selected node on all the same paths appear-ing in all the trees. This plays a role of global filter thatreflects the attitude of users to the composite decision ruledefined by the path.

On the time scale, the “restore” button in history view ofTBT allows user to go back to the model in any learningstage marked by a historical user operation. This is verynecessary because some human operations are possible toresult in a weaker model. By tracking the training/test errorin the history view, users are able to justify if the operationsare helpful to the learning process or not. And the “restore”button can conveniently undo the operations that are justifiednot helpful, while users do not need to go back the the verybeginning of the learning process.

RESULTSWe apply TBT to two datasets, i.e., mushroom2 and fusionhealth Diabetes dataset3. In mushroom dataset, there are8124 data samples with 22 features, and the goal is to train aclassifier separating poisonous mushrooms and edible mush-rooms. In fusion health dataset, there are 9948 training sam-ples and 4979 test samples, and the goal is to recognize ifa given patient has a diagnosis of Type 2 diabetes mellitus(T2DM). TBT is going to build GBT model for predictiontasks for the two datasets.

It shows that TBT performs effectively in presenting a trans-parent and interpretable understanding of GBT model andits learning process to users. In spite of the large numberof samples and features, TBT is able to show how the GBTmodel is established by integrating different features and de-cision rules using ensemble of trees, and how the modelworks on real data. User is able to explore the model fromforest scale to feature scale. Moreover, the interaction anduser operation is proved to be effective in testing the perfor-mance and tracking the changes casued of each part of themodel. Furthermore, the update of model in TBT is real-timescale, and thus the learning loop between ML and visualiza-tion can be iterated efficiently.

DISCUSSIONBy applying TBT to the two real datasets, we discover thatit can provide several novel insights that cannot be offeredby directly running GBT on the datasets. Firstly, instead oflearning a great number of decision trees blindly, we findout that lots of decision trees learned in GBT share the samepaths or even have very similar structure. This indicates thatthe complexity of GBT can be significantly reduced by userexploration.

Secondly, TBT is able to show the features or feature groupsthat have been selected to build decision rules in the trees.

2https://archive.ics.uci.edu/ml/datasets/Mushroom3http://www.kaggle.com/c/pf2012-diabetes/data

5

Figure 3. Interaction and user operation in visualization interface of transparent boosting tree.

The prediction statistics along a path in path purity view andtraining/test error shown in history view provide importantclues to evaluate and understand the roles the selected fea-tures and decision rules play in the prediction.

Thirdly, by using the information shown in the path purityview and history view, it is convenient to prune the trees andthus avoid overfitting of GBT model.

FUTURE WORKIn the future, we will do user study on our published websiteand collect the performance of TBT by involving differentusers’ interaction into the learning process. The collecteddata can be used to evaluate TBT more accurately.

Moreover, we plan to study whether and when human intel-ligence in TBT is able to improve the performance of ML inthe loop. Both a theoretical analysis and empirical studiesare necessary to be conducted in the future.

Furthermore, we will attempt to extend the transparent ma-chine learning idea to other ML models and algorithms, andtry to find if the interactive visualization can result in newunderstanding of the popular ML black boxes.

REFERENCES

1. S. Amershi, J. Fogarty, A. Kapoor, and D. S. Tan.Effective end-user interaction with machine learning. InAssociation for the Advancement of ArtificialIntelligence (AAAI), 2011.

2. J. H. Friedman. Greedy function approximation: Agradient boosting machine. Annals of Statistics,29:1189–1232, 2000.

3. T. Hastie, R. Tibshirani, and J. Friedman. The Elementsof Statistical Learning. Springer Series in Statistics.2001.

4. M.-C. Huang, J. J. Liu, W. Xu, N. Alshurafa, X. Zhang,and M. Sarrafzadeh. Using pressure map sequences forrecognition of on bed rehabilitation exercises. IEEEjournal of biomedical and health informatics,18(2):411–418, 2014.

5. M.-C. Huang, X. Zhang, W. Xu, J. Liu, andM. Sarrafzadeh. Ezwakeup: a sleep environment designfor sleep quality improvement. In CHI’14 ExtendedAbstracts on Human Factors in Computing Systems,pages 2389–2394. ACM, 2014.

6. J. J. Liu, M.-C. Huang, W. Xu, X. Zhang, L. Stevens,N. Alshurafa, and M. Sarrafzadeh. Breathsens: acontinuous on-bed respiratory monitoring system with

6

torso localization using an unobtrusive pressure sensingarray. IEEE journal of biomedical and healthinformatics, 19(5):1682–1688, 2015.

7. J. Talbot, B. Lee, A. Kapoor, and D. S. Tan.Ensemblematrix: Interactive visualization to supportmachine learning with multiple classifiers. InProceedings of the SIGCHI Conference on HumanFactors in Computing Systems, pages 1283–1292, 2009.

8. E. R. Tufte. The Visual Display of QuantitativeInformation. Graphics Press, Cheshire, CT, USA, 1986.

9. I. H. Witten and E. Frank. Data Mining: Practicalmachine learning tools and techniques. MorganKaufmann, 2005.

7

an interactive machine learning framework · 2016. 10. 19. · our project. however, decision tree...

Documents