start getting your feet wet in open source machine and deep learning
Post on 16-Apr-2017
235 Views
Preview:
TRANSCRIPT
Agenda for H2O Introduction Webinar
▪ Company Introduction (5 mins)
▪ H2O Introduction and Demo (35 mins)
– Installation of H2O
– Flight delay prediction use case
• Use case description
• Data set description
• Data munging
• Model creation
▪ Q&A (10 mins)
H2O AI Platform
In-Memory, Distributed Machine Learning with
Visual Intelligence
H2O AI in Spark with Data Prep and ML
Pipelines
Operationalize Model Building and Deployment
Governance.
Best-of-breed GPU Deep Learning
with easy API and AutoML TensorFlow, MXNet or Caffe
and H2O
Deep Water
AI For Business Transformation
Insights on Text, Images, Transactions,
Speech
Best Machine Learning Algorithms
on Spark
Platform to Build and Scale Data Products. Dual licensing (AGPL
and Commercial)
H2O is the #1 Platform for Open Source AI
Open Source Drives Community Adoption
Companies Using H2O.ai
2014 2015 2016 2017
9173
6427
3810
400
H2O.ai Users
2014 2015 2016 2017
83108
54163
38257
1000
* Data from July of every year, except for 2017 when data from Feb 21st are used.
H2O.ai Strongly Positioned in Key Analyst Reports and Press
“Overall customer satisfaction is very high.”
“H2O is especially suited to IoT edge and device scenarios.”
“H2O had the highest reference customer analytics support score of all the vendors.”
H2O.ai is a Visionary in the Gartner Magic Quadrantfor Data Science Platforms
“H2O.ai has significant adoption by large enterprises such as Macy’s, Comcast, and Capital One.”
“H2O.ai is best known for developing open source, cluster-distributed ML algorithms at a time (2011) when big data demanded them, but no one else had them.”
H2O.ai is a Strong Performer in the Forrester Predictive Analytics & Machine Learning
H2O.ai is a Top 10 Hot Artificial Intelligence (AI) Technologies on Forbes
H2O.ai named alongside Nvidia, Google, IBM, Intel, Microsoft, SAS, et al as in Top 10 Hot Artificial Intelligence (AI) on Forbes - contributed by Gil Press
H2O Use Cases – Videos and Talks
Auto Insurance
UBI Telematics
Commercial Insurance
Risk Analytics
Financial Services Customer Insights
Digital Marketing Consumer Behavior
Pawan Divarkarla Chief Data Officer
“H2O is an enabler in how people are
thinking about data.”
Conor Jensen Analytics Director
“Advanced analytics was one of the key
investments we decided to make.”
Brendan Herger Data Scientist
“H2O is the best solution to to iterate very quickly
on large datasets and produce meaning models.”
Satya Satyamoorthy Director, Software Dev
"I am a big fan of open source. H2O is the best fit in terms of cost as well as ease of use and
scalability and usability.”
Play Video Play Video Play Video Play Video
Progressive Zurich Capital One Nielsen Catalina
What is H2O?Open%source%in,memory%prediction%engineMath%Platform
• Parallelized%and%distributed%algorithms%making%the%most%use%out%of%multithreaded%systems
• GLM,%Random%Forest,%GBM,%PCA,%etc.
Easy%to%use%and%adoptAPI• Written%in%Java%– perfect%for%Java%Programmers• REST%API%(JSON)%– drives%H2O%from%R,%Python,%Excel,%Tableau
More%data?%Or%better%models?%BOTHBig%Data• Use%all%of%your%data%– model%without%down%sampling• Run%a%simple%GLM%or%a%more%complex%GBM%to%find%the%best%fit%for%the%data• More%Data%+%Better%Models%=%Better%Predictions
Supervised Learning
H2O AlgorithmsStatistical Analysis
Ensembles
Deep Neural Networks
• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson, and Tweedie
• Naive Bayes: Binary Text Classification
• Distributed Random Forest: Classification or Regression Models • Gradient Boosting Machine: Ensembles of shallow decision trees with
increasing refined approximations
• Deep Learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations
Unsupervised Learning
Clustering
Dimensionality Reduction
Anomaly Detection
• K-means: Partition observations into k clusters of the same spatial size. Categorical features are one hot encoded.
• Archetypes [GLRM]: Partition observations into k archetypes.
• Principal Component Analysis: Linearly transforms correlated variables to independent components
• Generalized Low Rank Model: Approximates data set as a product of two low dimensional factors. Extends PCA to handle sparse data, categorical data, and adds regularization.
• Autoencoders [Deep Learning]: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations
H2O Algorithms
Accuracy with Speed and Scale
HDFS
S3
SQL
NoSQL
Classification Regression
Feature Engineering
In-Memory
Map Reduce/Fork Join
Columnar Compression
Deep Learning
PCA, GLM, Cox
Random Forest / GBM Ensembles
Fast Modeling Engine
Streaming Nano Fast Java Scoring Engines
Matrix Factorization
Clustering
Munging
Reading Data from HDFS into H2O with R
H2OH2O
H2O
data.csv
HTTP REST API request to
H2Ohas HDFS path
H2O ClusterInitiate distributed
ingest
HDFSRequest data from
HDFS
STEP 22.2
2.3
2.4
Rh2o.importFile()
2.1R function
call
Reading Data from HDFS into H2O with R
H2OH2O
H2O
R
HDFS
STEP 3
Cluster IPCluster Port
Pointer to Data
Return pointer to data in REST API
JSON Response
HDFS provides
data
3.3
3.43.1h2o_df object
created in Rdata.csv
h2o_df H2OFram
e
3.2Distributed
H2OFrame in DKV
H2O Cluster
top related