analytics life cycle: pangea is panacea! · analytics lifecycle –enterprise view data ingestion...
TRANSCRIPT
1
Analytics Life Cycle: Pangea is Panacea!
The accompanying material and any related oral or written discussion (the “Materials”) is governed by the limitations detailed below:
Licensed Content and Ownership - HCL, PANGEA and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. Content distributed within a HCL client organization must display HCL copyright notices and attributions of authorship.
IP & Patent Liability - This Solution/ Proposition is covered by a pending patent. Any refactoring or subsequent re-use is an unlicensed use and therefore constitutes patent infringement. If there is any further detailed information required, please contact [email protected]
Liability Disclaimer -The information herein is for informational purposes only and represents the current view of HCL Technologies Ltd as of the date of this presentation. Because HCL must respond to changing market conditions, it should not be interpreted to be a commitment on the part of HCL, and HCL cannot guarantee the accuracy of any information provided after the date of this presentation. HCL MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Terms of Use, IP and Liability Disclaimer
Terms of Use, IP and liability disclaimer
Analytics Lifecycle – Enterprise View
Data Ingestion
Algorithm Selection
Model Building and Tuning
Model Monitoring
Ingest data from diverse data sources- It is a common ask
Lesser coding and more effort on model execution& tuning accounts for productivity
ML models need regular monitoring and Updation on need basis unlike traditional programs
Select appropriate algorithms suitable for the Business requirement(s) – Automated recommendationIs definite ask
Ease of deployment and (near) real timeScoring are crucial for enterprise acceptance& Success
Data Preparation
Ontologies for schema preparation& Transformation should be defined
.
Data
IngestionBusiness Problem
Description
Data
Preparation
Algorithm
SelectionModel
Building & Tuning
Model Deployment
Model Monitoring
Business
Insights & VisualizationsBusiness Problem Description
Extract problem definition from businessOwners – Tricky often times but important
Model Deployment
Business Insights
Business insight wrappers are crucial for the Successful adoption of analytics/ML
ML/DL Model Lifecycle
.
Drift
Analysis Automatic Output
Variance Analysis
Manual/
Supervised Analysis
Identify impacted
parametersRevise
Model Parameters
Update model
Deployed model
Monitor Inputs
Input Analysis
Output Analysis
Drift/Newness
Error/Variance/FP/FN
Numerical Data – Distribution AnalysisCategorical Data – Obsolete/New categoriesText data – Obsolete/New Keywords
Estimate data shift @ regular intervalsCheck for new/deleted categories/words
Error/variance for time/state modelsFP/FN for feedback based models
Boolean/Categorical/Labels (Clusters)
Analytics Model Monitoring – Heuristics to Watch
Burst or patch of data causes abrupt transition
Production data causes the model outcome to shift/change
incrementally
Yet times data influences gradual change in the
outcomes over a period of time
Some data sets yield recurring change states in output
Stray incidents occur when occasional input results in
unexpected output
Types of Analytical Models - Recap
Preventive and proactive alerts and life time estimates
Unsupervised Model that groups similar data/objects into
k - clusters
OptimizationClustering
Heuristic and OR models for optimization
Survival
Time series based forecast models
Supervised models that label datasets
Classification ForecastRegression
Linear, non-linear and logistic regression models
Best Practices – Analytics Adoption
Data analysis for duplicates, missing values
etc
Model building, tuning,
deployment
Model monitoring at
regular intervals
Reduced Time to Value
Ontologies and schema preparation
* These views may not expressly or implied to the affiliated organization. They are entirely speaker’s opinions based on his experience and understanding
Data ingestion with diverse connectors
Business logic wrappers and
insights + visualizations
Pangea* - Overview
Pangea is a distributed analytics workbench that provides an end to end platform for building and operationalizing Analytics quicker
Delivers end to end analytics with an intuitive drag and drop of data and models/algorithms
Reduces model deployment time from several months to days
Data & Code distribution on virtual nodes ensures scalability
Actionable Insights
customizable solution to fit the client needs
Zero Coding Approach Single Click Deployment
Distributed Analytics at ScaleModular and Flexible
Pangea brings in automation to achieve speed, scale, collaboration and enforces best practices implementation across analytics life cycle to reduce the total cost of ownership
Drastic time-to-insight reduction
Data Ingestion from divergent data sources
Modelling and tuning without coding
Inbuilt & 3rd party UI for reports and charts
Deployment through clicks and configuration
* HCL Internal IP/Tool
• Data ingestion is key without too much emphasis on ‘outcome’ at that time
• Data preparation goes hand and glove with business problem descriptions
• Ontology and/or schema preparation invisible yet inevitable step in the
enterprise analytics life cycle
• Analytics/ML Modelling without ease of deployment and monitoring are
short-lived
• Analytical models without business wrappers are only serve as PoCs
9
Summary – Pangea Best Practices