d2k-tutorial
TRANSCRIPT
Loretta Auvil
Automated Learning GroupNational Center for Supercomputing ApplicationsUniversity of Illinois217. [email protected]
Supercomputing 2003
D2K Tutorial
alg | Automated Learning Group
Outline
• Overview of D2K Functionality
• Hands-On Exercise: Predictive Modeling• Classification
– Using Naïve Bayesian– Using Decision Trees
• Hands-On Exercise: Discovery• Rule Association
– Using SQL Htree • Clustering
• Deviation Detection• Visualization
– Parallel Coordinates– Small Multiples of scatterplots
alg | Automated Learning Group
Goals
• Understanding the Knowledge Discovery in Databases Process
• Gaining Knowledge of Basic Data Mining Operations and Techniques
• Understanding the Role of the Knowledge Discovery Framework
• Key Issues in Utilization of D2K Framework
• Understanding the Role of Information Visualization in Data Mining
alg | Automated Learning Group
What is It?
Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data
• The understandable patterns are used to:• Make predictions about or classifications of new data• Explain existing data• Summarize the contents of a large database to support decision
making• Create graphical data visualization to aid humans in discovering
complex patterns
Overview of Knowledge Discovery
alg | Automated Learning Group
Knowledge Discovery Process
Overview of Knowledge Discovery
alg | Automated Learning Group
Required Effort for each KDD Step
Arrows indicate the direction we want the effort to go
0
10
20
30
40
50
60
ObjectivesDetermination
Data Preparation Data Mining Interpretation/Evaluation
Eff
ort
(%
)Overview of Knowledge Discovery
alg | Automated Learning Group
Three Primary Paradigms
• Predictive Modeling – supervised learning approach where classification or prediction of one of the attributes is desired• Classification is the prediction of predefined classes
– e.g. Naive Bayesian, Decision Trees, and Neural Networks
• Regression is the prediction of continuous data
– e.g. Neural Networks, and Decision (Regression) Trees
• Discovery – unsupervised learning approach for exploratory data analysis• e.g. Association Rules, Link Analysis, Clustering, and Self
Organizing Maps
• Deviation Detection – identifying outliers in the data• e.g. Visualization
Overview of Knowledge Discovery
alg | Automated Learning Group
Importance of Data Mining Framework
• Provides capability to build custom applications
• Provides access to data management tools • Loading data from database, flat file or DataSpaces
• Contains data mining algorithms for prediction and discovery that can be applied
• Provides data transformations for standard operations
• Supports an extensible interface for creating one’s own algorithms
• Provides means for building and applying models
• Provides integrated visualizations components
• Provides access to distributed computing capabilities
alg | Automated Learning Group
D2K - Data To Knowledge
D2K is a flexible data mining system that integrates effective analytical data mining methods for prediction, discovery, and anomaly detection with data management and information visualization
D2K Overview
alg | Automated Learning Group
D2K and Its Many Components
• D2K InfrastructureD2K API, data flow environment, distributed computing framework and runtime system
• D2K ModulesComputational units written in Java that follow the D2K API
• D2K ItinerariesModules that are connected to form an application
• D2K ToolkitUser interface for specification of itineraries and execution that provides the rapid application development environment
• D2K-Driven ApplicationsApplications that use D2K modules, but do not need to run in the D2K Toolkit
D2K Overview
alg | Automated Learning Group
D2K Toolkit
Major features that D2K provides to an application developer include:
• Visual programming system employing a data flow paradigm
• Scalable distributed computing capabilities
• Flexible and extensible software development environment
• Multi-layered learning strategies
• Integrated environment for models and visualization
• Capability to access data transparently from multiple sources
D2K Overview
alg | Automated Learning Group
D2K Basic 1.0
• New release of D2K 3.0• New release of the D2K Toolkit• New release of a set of D2K Modules to perform data mining techniques
• Prediction– Decision Trees
C4.5 Decision Tree, Continuous Decision Tree, SQL Rain Forest Decision Tree– Naïve Bayesian Classification and SQL Naïve Bayesian Classification– Neural Networks
• Discovery– Rule Association
Apriori, Htree– Clustering
Hierarchical Agglomerative, Kmeans, Coverage, etc.
• Better documentation for Toolkit and modules• Includes visualizations for many of the modeling approaches• Includes a set of data transformations
• Attribute selection, binning, filtering, attribute construction
• Includes optimization strategy for searching parameter space• Plus more…
D2K Overview
alg | Automated Learning Group
D2K 3.0 Features
• Current Release downloadable off our website• Extension of existing API
• Provides the capability to programmatically connect modules and set properties
• Allows D2K-driven applications to be developed• Provides ability to pause and restart an itinerary
• Enhanced Distributed Computing• Allows modules that are re-entrant to be executed remotely• Uses Jini services to look up distributed resources• Includes interface for specifying the runtime layout of a distributed itinerary
• Processor Status Overlay • Shows utilization of distributed computing resources
• Distributed Checkpointing• Resource Manager
• Provides a mechanism for treating selected data structures as if they were stored in global memory
• Provides memory space that is accessible from multiple modules running locally as well as remotely
D2K Overview
alg | Automated Learning Group
New D2K 4.0 Highlights
• Ability to use the web for deployment
• Ability for modules to run headless (with no gui)
• Changed the way itineraries are saved• Stored in zip file • Itinerary is described in an xml format• Annotation is saved in html format• Additional data is stored in a serialized HashMap
• Table structure was re-implemented to improve performance and simplify the API
• Improvements of module selection, with area selection
• Support of copy and paste of selected modules
D2K Overview
alg | Automated Learning Group
D2K ToolKit
1. Workspace2. Resource
Panel3. Modules4. Models5. Itineraries6. Visualizatio
ns7. Generated
Visualizations
8. Generated Models
9. Component Information
10.Toolbar11.Console
D2K Overview
alg | Automated Learning Group
D2K Modules
Input Module: Loads data from the outside world• Flat files, database, etc.
Data Prep Module: Performs functions to select, clean, or transform the data• Binning, Normalizing, Feature Selection, etc.
Compute Module: Performs main algorithmic computations• Naïve Bayesian, Decision Tree, Apriori, etc.
User Input Module: Requires interaction with the user• Data Selection, Input and Output selection, etc.
Output Module: Saves data to the outside world• Flat files, databases, etc.
Visualization Module: Provides visual feedback to the user• Naïve Bayesian, Rule Association, Decision Tree, Parallel Coordinates, 2D
Scatterplot, 3D Surface Plot
D2K Overview
alg | Automated Learning Group
D2K Module Icon Description
Module Progress BarAppears during execution to show
the percentage of time that this module executed over the entire execution time. It is green when the module is executing and red
when not
Input PortRectangular shapes on the left side
of the module represent the inputs for the module. They are
colored according to the data type that they represent
Properties SymbolIf a “P” is shown in the lower left
corner of the module, then the module has properties that can
be set before execution
Output PortRectangular shapes on the right side of the module represent the outputs for the module. They are colored according to the data type that they represent
D2K Overview
alg | Automated Learning Group
Resource Panel
The area to the left of the Workspace that contains the components necessary for data analysis• Modules• Models• Itineraries• Visualizations
D2K Overview
alg | Automated Learning Group
D2K Itineraries
• Itineraries are partial or complete applications composed of connected modules
• D2K Core Itineraries include:• Prediction• Discovery• Anomaly Detection• Data Selection• Transformation• Visualization
D2K Overview
alg | Automated Learning Group
Workspace
The Workspace is the area where applications are formed • Modules are placed, connected, and properties set• Itineraries are saved and executed
D2K Overview
alg | Automated Learning Group
Session Panes
• Component Information• Shows detailed information about components of D2K• Shows module information, inputs, outputs, and property
descriptions• Shows itinerary annotation
• Generated Visualization• Shows visualizations generated during this session• Provides ability to save these visualizations for later use
• Generated Models• Shows models generated during this session• Provides ability to save these visualizations for later use
D2K Overview
alg | Automated Learning Group
D2K Setup
• Preferences• Written to a file called “d2k.props”• Set up automatically the first time D2K is installed• Changed via Edit menu… Preferences…• Some changes do require restart of D2K• Check the User Manual for more details (available online)
D2K Overview
alg | Automated Learning Group
Using the Toolkit
Build an itinerary for loading data and viewing it in a TableViewer
• Drag and Drop Modules from Modules Pane of Resource Panel to the Workspace as shown• Expand directory ncsa/io/file/input
– Drag and Drop Input1Filename to Workspace
– Drag and Drop CreateDelimitedParser to Workspace
– Drag and Drop ParseFileToTable to Workspace
• Expand directory ncsa/vis– Drag and Drop TableViewer to
Workspace
D2K Overview
alg | Automated Learning Group
Using the Toolkit (cont’d)
Connect the modules like shown
• Drag from the output port of one module to the input port of the next module
• Check the properties of modules by double clicking on the module• Input File Name
– Choose data/UCI/iris.csv• Create Delimited File Parser
– Defaults work• Parse File To Table
– Defaults work
• Click Run to execute
D2K Overview
alg | Automated Learning Group
Variation Using a Nested Itinerary
• An itinerary can be used as a module – nested itinerary
• Properties can be set by holding Control and double clicking on the nested itinerary
• Then connecting the inputs and output ports of the nested itinerary as one would any other module
D2K Overview
alg | Automated Learning Group
PREDICTIVE MODELING
CLASSIFICATIONNAÏVE BAYESIAN
alg | Automated Learning Group
Naïve Bayesian Classification
• Applied to supervised learning problem
• Expects training examples with input and output attributes
• Single output attribute with small number of possible values for best performance
• Computes the distribution of an input associated with each class, for example, given the variable X with a value at xi the probability of it being in Class A is greater than it being in Class B
Predictive Modeling: Naïve Bayesian
Mathematically speaking — If one knows how P(X | C), and the densities P(xi) and P(cj) (prior probabilities) are known then the classifier is one which assigns class cj to datum xi if cj has the highest posterior probability given the data
alg | Automated Learning Group
Bayesian Classification: Why?
• Probabilistic learning: Calculate explicit probabilities for hypothesis, is among the most practical approaches to certain types of learning problems
• Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct
• Prior knowledge: Can be combined with observed data
• Standard: • Provide a standard of optimal decision making against which other
methods can be measured• In a simpler form, provide a baseline against which other methods
can be measured
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Naïve Bayesian Classification
• Naïve assumption: • Feature independence
• P(xi|C) is estimated as the relative frequency of examples having value xi as feature in class C
• Computationally easy!!!
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Classification Applications Using Naïve Bayesian• Predict a response
to a marketing campaign
• Predict the most profitable customers for a product or service
• Classify applicants as high/med/low risk
• Predict which customers will leave for a competitor
• Predict whether email message is SPAM or not
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Opening the Itinerary
• Click on the “Itinerary” Pane in the Resource Panel
• Expand the “Prediction” directory with a single click
• Double click on “NaïveBayes” to load the itinerary into your Workspace
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Executing the Itinerary
• Check modules with properties • Double click to open
property editor• Respond to User Interfaces
that open
• Click Run button
• Respond to GUI’s that pop-up
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
PredictionTableReport for iris data
Double click on the PredictionTableReport to launch the report that shows the classification error and confusion matrix for the data
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Naïve Bayesian Visualization
• Double click on the NaiveBayesVis to view the results
• The upper right hand pane shows the distribution of the classes
• The left hand pane shows the attributes and each of their values. They are listed by order of significance• The message box shows
details about each pie chart when brushed
• Clicking on a pie chart shows how knowing this information can change the overall class predication
• Clicking on multiple pie charts calculates conditional probabilities
Notice Iris-versicolor has a 33% likelihood
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Naïve Bayesian Visualization
What if scenarios…
• Click on petal-width of 1.3:1.9
• Now the probability of Iris-versicolor is 66.37%
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Naïve Bayesian Visualization
What if scenarios… continue with conditional probabilities calculations
• Click on petal-length of 3.95:5.32
• Click on sepal-length of 5.28:6.15
• Now the probability of Iris-versicolor is 94.99%
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
Applying Models
• In Generated Models Session Pane, right click on the model and choose Save
• The saved model shows up in the Model View of the Resource Panel
• Click and drag the model into the workspace
• Connect the input and output of the model as shown
Predictive Modeling: Naïve Bayesian
alg | Automated Learning Group
PREDICTIVE MODELING
CLASSIFICATIONDecision Trees
alg | Automated Learning Group
Decision Trees Classification
• Supervised learning problem
• Builds a model to classify one attribute based on other data attributes• Builds the tree by deciding how
to split the data so that classification error is reduced
• Shown is a decision tree predicting whether one will play tennis based on some weather conditions
Predictive Modeling: Decision Trees
alg | Automated Learning Group
Applications Using Decision Trees
• Decision trees can solve both classification and regression problems
• Decision Trees work for many of the same problems as Naïve Bayesian analysis• Prediction of who should be
given a loan• Prediction of high/med/low
risk
Predictive Modeling: Decision Trees
alg | Automated Learning Group
PredictionTableReport for iris data
Double click on the PredictionTableReport to launch the report that shows the classification error and a confusion matrix for the dataNote: This is a very clean data set
Predictive Modeling: Decision Trees
alg | Automated Learning Group
Decision Tree Visualization
Two main panes
• Navigator Pane shown in the top left pane illustrates the full decision tree, the viewable decision tree is shown with a black box outline
• Viewable Tree shows a chart of the percentages of the examples in each of the classes• Brushing indicates the
percentages in the Brushing Pane
• Clicking on a small chart opens a larger view of the chart -showing the complete path taken to get to this node
Predictive Modeling: Decision Trees
alg | Automated Learning Group
Using the Model
• In Generated Models Session Pane, right click on the model and choose Save. The saved model shows up in the Model View of the Resource Panel
• Click and drag the model into the workspace (shown in green circle, disconnect the items in the red blob)
• Connect the input and output of the model as shown• Results can be sent to the
PredictionTableReport and to the DecisionTreeVis
• New (test) data can be examined with the model
Predictive Modeling: Decision Trees
alg | Automated Learning Group
DISCOVERYRULE ASSOCIATION
Using fp-growth
alg | Automated Learning Group
Market Basket Example
Is soda typically purchased with bananas?Does the brand of soda make a difference?
Where should detergents be placed in theStore to maximize their sales?
Are window cleaning products purchased when detergents and orange juice are bought together?
How are the demographics of the neighborhood affecting what customers are buying?
?
?
?
?
Discovery: Rule Association
alg | Automated Learning Group
Association Rules
• There has been a considerable amount of research in the area of Market Basket Analysis. Its appeal comes from the clarity and utility of its results, which are expressed in the form association rules
• Given• Database of transactions• Each transaction contains a set of items
• Find all rules X->Y that correlate the presence of one set of items X with another set of items Y• Example: When a customer buys bread and butter, they buy milk
85% of the time
Discovery: Rule Association
alg | Automated Learning Group
Overview
• Unsupervised learning problem
• Find all rules that correlate the presence of one set of items X with another item Y• Example: When a customer buys bread and butter, they buy milk
85% of the time
• Support is the percentage of the records that contain both X and Y• A rule must have some minimum user-specified support to show its
impact
• Confidence is the percentage of records that contain X and Y out of the number of records that contain X• A rule must have some minimum user-specified confidence to show
its value
Discovery: Rule Association
alg | Automated Learning Group
Results: Useful, Trivial, or Inexplicable?
• While association rules are easy to understand, they are not always useful
Useful On Fridays convenience store customers often purchase diapers and beer together
Trivial Customers who purchase maintenance agreements are very likely to purchase large appliances
Inexplicable When a new Super Store opens, one of the most commonly sold item is light bulbs
Discovery: Rule Association
alg | Automated Learning Group
How Does It Work?
Orange juice, Soda
Milk, Orange Juice, Window Cleaner
Orange Juice, Detergent
Orange juice, detergent, soda
Window cleaner, soda
OJ
4
1
1
2
1
OJ
Window Cleaner
Milk
Soda
Detergent
1
2
1
1
0
1
1
1
0
0
2
1
0
3
1
1
0
0
1
2
WindowCleaner Milk Soda Detergent
Co-Occurrence of Products
Customer Items
1
2
3
4
5
Grocery Point-of-Sale Transactions
Orange Juice, Soda
Milk, Orange Juice, Window Cleaner
Orange Juice, Detergent
Orange Juice, Detergent, Soda
Window Cleaner, Soda
Discovery: Rule Association
• In the data, two of five transactions include both soda and orange juice• These two transactions
support the rule• Support for the rule is
two out of five or 40%
• Since both transactions that contain soda also contain orange juice • There is a high degree
of confidence in the rule• In fact every transaction
that contains soda contains orange juice
• So the rule If soda, THEN orange juice has a confidence of 100%
alg | Automated Learning Group
Confidence and Support - How Good Are the Rules
• A rule must have some minimum user-specified confidence• 1 and 2 -> 3 has a 90% confidence if when a customer bought 1
and 2, in 90% of the cases, the customer also bought 3
• A rule must have some minimum user-specified support• 1 and 2 -> 3 should hold in some minimum percentage of
transactions to have value
Discovery: Rule Association
alg | Automated Learning Group
Association Examples
• Find all rules that have “Diet Coke” as a consequent (result)
• These rules may help plan what the store should do to boost the sales of Diet Coke
• Find all rules that have “Yogurt” in the antecedent (condition)
• These rules may help determine what products may be impacted if the store discontinues selling “Yogurt”
• Find all rules that have “Brats” in the antecedent and “mustard” in the consequent
• These rules may help in determining the additional items that have to be sold together to make it highly likely that mustard will also be sold
• Find the best k rules that have “Yogurt” in the result
Discovery: Rule Association
alg | Automated Learning Group
Basic Process
• Choosing the right set of items• Taxonomies
• Virtual Items
• Anonymous versus Signed
• Generation of rules• If condition Then result
• Negation/Dissociation
• Improvement
• Overcoming the practical limits imposed by thousand or tens of thousands of products• Minimum Support Pruning
Discovery: Rule Association
alg | Automated Learning Group
Strengths and Weaknesses
Strengths
• It produces easy to understand results
• It supports undirected data mining
• It works on variable length data
• Rules are relatively easy to compute
Weaknesses
• It is an exponential growth algorithm
• It is difficult to determine the optimal number of items
• It discounts rare items
• It is limited by the support that it provides attributes
• It produces many rules
• For large numbers of attribute-value combinations, considerable cpu and memory resources are consumed
Discovery: Rule Association
alg | Automated Learning Group
Opening the Itinerary
• Click on the “Itinerary” Pane in the Resource Panel
• Expand the “Discovery” directory with a single click
• Expand the “RuleAssociation” directory with a single click
• Double click on “fp-growth” to load the itinerary into your Workspace
Discovery: Rule Association Using fp-growth
alg | Automated Learning Group
Executing the Itinerary
• Check modules with properties
• Double click to open property editor • fp-growth• Compute Confidence
• Respond to User Interfaces that open
• Click Run button
Discovery: Rule Association Using fp-growth
alg | Automated Learning Group
Rule Association Visualization
• Read rules down the column
• Example - the first rule is • If petal-width Binned=[…:0.7]
then flower-type=Iris-setosa• Support = 25%• Confidence = 100%
• Brush the bars to find out support and confidence levels
• Different sorting schemes• Sort by Confidence• Sort by Support• Alphabetize button sorts the
attribute-value pairs alphabetically
• Rank button sorts the rows based on the current Confidence/Support selection, moving the consequents and antecedents of the highest ranking rules to the top of the attribute-value list
Discovery: Rule Association Using fp-growth
alg | Automated Learning Group
Choosing the Right Set of Items
FrozenFoods
FrozenDesserts
FrozenVegetables
FrozenDinners
FrozenYogurt
FrozenFruit Bars
IceCream Peas Carrots Mixed Other
RockyRoad
Chocolate Strawberry Vanilla CherryGarcia
Other
Part
ial P
rod
uct
Taxon
om
yG
en
era
lS
pecifi
c
Discovery: Rule Association
alg | Automated Learning Group
Other Association Rule Applications
• Quantitative Association Rules• Age[35..40] and Married[Yes] -> NumCars[2]
• Association Rules with Constraints• Find all association rules where the prices of items are > 100
dollars
• Temporal Association Rules• Diaper -> Beer (1% support, 80% confidence)• Diaper -> Beer (20%support) 7:00-9:00 PM weekdays
• Optimized Association Rules• Given a rule (l < A < u) and X -> Y, Find values for l and u such
that support greater than certain threshold and maximizes a support, confidence, or gain
• ChkBal [$ 30,000 .. $50,000] -> JumboCD = Yes
Discovery: Rule Association
alg | Automated Learning Group
DISCOVERY
CLUSTERING
alg | Automated Learning Group
Overview
• Unsupervised learning problem
• Group all examples that are similar
• View results with dendogram or parallel coordinates
• Provide several different clustering algorithms• Kmeans• Buckshot• Fractionation• Coverage
Discovery: Clustering
alg | Automated Learning Group
Clustering Algorithms
• KMeans clustering• Creates a sample set containing Number of Clusters rows is chosen from an input
table of examples and used as initial cluster centers • These initial clusters undergo a series of assignment/refinement iterations, resulting
in a final cluster model
• Buckshot clustering • Creates a sample of size Sqrt(Number of Clusters * Number of Examples) is chosen at
random from the table of examples • This sampling is sent through the hierarchical agglomerative clustering module to
form Number of Clusters clusters. These clusters' centroids are used as the initial "means" for the cluster assignment module. The assignment module, once it has made refinements, outputs the final Cluster Model
• Coverage clustering • Creates a sample set from the input table such that the set formed is approximately
the minimum number of samples needed such that for every example in the input table there is at least one example in the sample set of distance = Distance Threshold (% of Maximum)
• This sampling is sent through the hierarchical agglomerative clustering module to form Number of Clusters clusters. These clusters' centroids are used as the initial "means" for the cluster assignment module. The assignment module, once it has made refinements, outputs the final Cluster Model
Discovery: Clustering
alg | Automated Learning Group
Clustering Algorithms (2)
• Fractionation• Creates a sample set of the initial examples (converted to clusters)
by a key attribute denoted by Sort Attribute • The set of sorted clusters is then segmented into equal partitions of
size maxPartitionsize • Each of these partitions is then passed through the agglomerative
clusterer to produce numberOfClusters clusters • All the clusters are gathered together for all partitions and the
entire process is repeated until only Number of Clusters clusters remain. The sorting step is to encourage like clusters into same partitions
Discovery: Clustering
alg | Automated Learning Group
Opening the Itinerary
• Click on “Itinerary” Pane in the Resource Panel
• Expand the “Discovery” directory
• Expand the “Clustering” directory
• Double click on “BuckshotClusterer”
Discovery: Clustering
alg | Automated Learning Group
Clustering Results
Dendogram or Parallel Coordinates
Discovery: Clustering
alg | Automated Learning Group
DEVIATION DETECTION VISUALIZATIONS
PARALLEL COORDINATESSCATTERPLOT
alg | Automated Learning Group
Itinerary
• Visualization to detect outliers and patterns
• Expand the vis directory and load the “ParallelCoordinate” itinerary
Deviation Detection: Parallel Coordinates
alg | Automated Learning Group
Parallel Coordinates - Visualization
• Each vertical line represents a attribute with the minimum and maximum values shown at bottom and top
• Each record has a line that connects it to the its value at each attribute
• Lines are colored based on the output attribute
• Clicking and dragging on the label boxes allows the attributes to be rearranged
• Zooming is accomplished by dragging a box over the desired area. Clicking returns to the original view
Deviation Detection: Parallel Coordinates
alg | Automated Learning Group
Scatterplots – Itinerary
• Visualization to detect outliers and patterns
• Load the “scatterplot” itinerary
Deviation Detection: Scatterplots
alg | Automated Learning Group
Scatterplots – Visualization
Deviation Detection: Scatterplots
alg | Automated Learning Group
Small Multiples of Scatterplots - Itinerary
Deviation Detection: Small Multiples
alg | Automated Learning Group
Small Multiples of Scatterplots Vis
Deviation Detection: Small Multiples
alg | Automated Learning Group
Small Multiples of Linear Regressions Vis
Deviation Detection: Small Multiples
alg | Automated Learning Group
D2K Streamline (D2K SL)
• Reduces the learning curve associated with the KDD process
• Encompasses discovery, prediction and deviation detection techniques
• Saves and applies models to new data sets easily
• Supports return to earlier steps in the KDD process to run with different parameters
• Uses the D2K Infrastructure transparently
D2K SL
alg | Automated Learning Group
New D2K User Interface – D2K SL
• Provides step by step interface to guide user in data analysis
• Uses same D2K modules
• Provides way to capture different experiments (streams)
D2K SL
alg | Automated Learning Group
Another View of the New D2K User Interface – D2K SL
• Help users keep track of data
• Define templates that can be reused in different experiments (streams)
D2K SL
alg | Automated Learning Group
The ALG Team
StaffLoretta AuvilRuth AydtPeter BajcsyColleen BushellDora CaiDavid ClutterLisa GatzkeVered GorenChris NavarroGreg PapeTom RedmanDuane SearsmithAndrew ShirkAnca SuvaialaDavid TchengMichael Welge
StudentsTyler AlumbaughPeter GrovesOlubanji IyunSang-Chul LeeXiaolei LiBrian NavarroJeff NgScott RamonSunayana SahaMartin UrbanBei YuHwanjo Yu
alg | Automated Learning Group
Licensing D2K
• Faculty, staff and students at US academic institutions will be able to license and use D2K for free by downloading from alg.ncsa.uiuc.edu
• Private Sector Partners who have provided funding for projects related to D2K will be able to license and use D2K for free
• Private Sector Partners who have not provided funding will be able to license and use D2K for a discounted fee
Contact John McEntireOffice of Technology Management308 Ceramics Building, MC-243105 South Goodwin AvenueUrbana, Illinois 61801-2901(217) [email protected]