random forests 13-06-2015
Post on 15-Apr-2017
137 Views
Preview:
TRANSCRIPT
Random ForestsCork Big Data & Analytics Group
Decision Trees• One of the older ML algorithms (Breiman et al. 1983)• One of the most popular (Rexer Data Miner Survey 2013)• Really versatile, handle non-linear relationships,
missing data, outliers, categorical or numerical targets – you name it!• Can be easily interpreted – the rules can be presented
as a table, or a series of if-then statements for each “split”• Can also be visually represented• CART, ID3, C4.5, CHAID, C5.0
Note: hope you don’t mindthe political example!
Decision Trees (cont’d.)• Decision Trees have low bias – the created model
generally approximates reality well• On the other hand, they have high variance – a
model tends to perform differently on different samples of the data• We need consistent performance, so what now?• How about we “grow” a bunch of decision trees, and
average them up?• Breiman thought about this, and in 2001 developed…
Random Forests• Mimics an ensemble of “experts” making a decision• Grows a bunch of bagged decision trees, using
subsets of variables (to handle variance)• Fast (relatively), scalable, has all the benefits of
decision trees• Has several parameters to tweak for performance• Implemented in all major ML software and libraries• But – is a “black box”, so no rules, no visualizations,
little inference
Random Forests (cont’d.)• Give you “free” cross-validation (through calculating
OOB error)• This means shorter training time• Calculates variable importance• Partial dependence plots• Now supports censored (survival) data• Handles class imbalance• Can create very large objects in memory
Random Forests in R• randomForest• randomForestSRC• ggRandomForests• party• randomForestCI (swager on GitHub)• edarf (zmjones on GitHub)• Boruta
Tuning Parameters• Number of Trees• Number of Variables• Prior Class Weights• Cutoff• Sample Size• Node Size
Some Resources• James, Witten, Hastie, Tibshirani, An Introduction to
Statistical Learning• Kuhn, Johnson, Applied Predictive Modeling• Jones, Linder, Exploratory Data Analysis using Random
Forests (article)• Package Vignettes on CRAN• CrossValidated.com
THANK YOU!srdjan.santic@gmail.com
top related