predictive analytics techniques: what to use for...
TRANSCRIPT
3
Proven Performance Since 1995
� High-quality, vendor-neutral educational offerings � Independent analyst research staff and thought
leadership � Trusted sources of emerging information and trends � Ability to bring together qualified BI/DW professionals
and solution providers Premium Membership, conferences, seminars, research, publications, topical portals, whitepaper library, and numerous online programs
TDWI helps business and IT professionals gain insight about data warehousing, BI, and analytics:
www.tdwi.org
Agenda
• Introduction to big data and predictive analytics
• Popular predictive analytics methodologies – Examples – Guidelines
• Deployment models
Big Data Analytics
Big Data Analy,cs
Text analy,cs
Predic,ve analy,cs
Slicing and dicing
Etc. Visual discovery
Link analysis
Stream mining
Predictive Analytics
A statistical or data mining solution consisting of algorithms and techniques that can be used on both structured and unstructured data to determine outcomes
A Lot of It Is Used to Predict Behavior
• People – Churn – Marketing – Fraud detection
• Machine – Operations maintenance
• And much, much more! Good source for use
cases
Supervised
• Use it when you know outcomes of interest – Leave vs. stay – Revenue prediction
• Need enough data for training, testing, validation
Unsupervised
• Does not include target information • Looks for commonalities/hidden structures
in data • May not produce useful insight • Is it prediction?
Techniques
• Supervised – Classification – Regression – Neural networks
• Unsupervised – Clustering – Association
• Supervised – Deep learning, auto-encoders – Decision trees, random
forests, gradient boosting – Support vector machines,
Bayesian classifiers, principal component, discriminant analysis
• Unsupervised – Nearest-neighbor mapping,
k-means clustering, self-organizing maps
– Factor analysis, link analysis
Artificial Neural Networks (2)
Source: Commonsenseatheism.com
Can be used on a range of problems; good for classification and estimation
Association Rule Mining
Transac'on Items
1 milk, leDuce
2 leDuce, diapers, beer, cookies
3 milk, diapers, beer, plas,c bags
4 leDuce, milk, diapers, beer
5 leDuce, milk, diapers, plas,c bags
Diapers -> Beer
Two concepts: support and confidence
Used to find relationships
Quick Quiz
• How much revenue will this customer bring? – Regression
• Who is going to take a certain action? – Classification
• What are my customer segments? – Clustering
• If a customer buys X, what else might it buy? – Association rules
Strengths & Weaknesses: Decision Trees
Strengths • Easy to understand
– Rules vs. equations
• Easy to explain • Not a black box • Data doesn’t have to
follow any distribution • Can handle interactions
between variables
Weaknesses • Continuous value
predictions • Can be computationally
expensive to train • Can have problems if
many classes and few training examples
• Overfitting
Strengths & Weaknesses: Regression
Strengths • Simple to use • Easy to explain
through independent variables
Weaknesses • Relationship needs to
be linear • Hard-to-handle
categorical variables or variables that interact
• Outliers hard to model
Strengths & Weaknesses: Neural Networks
Strengths • Good for a specific
class of problems • May be easy to
implement • Non-linear/interaction
variables
Weaknesses • Hard-to-explain
output (black box) • Output might be
unpredictable • Training can take a
long time
Strengths & Weaknesses: K Means
Strengths • Good for large
datasets • Simple • Efficient
Weaknesses • Need to specify K
upfront • Sensitive to outliers,
which may result in incorrect cluster boundaries
• Needs a mean (categorical data?)
Strengths & Weaknesses: Association Rules
Strengths • Simple • Text data
(categorical)
Weaknesses • Can be
computationally expensive
• Potential for spurious patterns
• Rules do not mean causality
Vendors Are Offering a Range of Options for Predictive Analytics
• UI easier to use: visual vs. code based • Automation • Collaboration/interactivity • Cloud options • Operationalizing and embedding advanced
analytics