air9 analytics environment - openshift · 2020. 4. 9. · ©2019 discover financial services •...
TRANSCRIPT
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 1
AIR9 Analytics Environment
AIR9 Analytics Environment
Brandon Harris / Anirudh Pathe
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 2
AIR9 Analytics Environment
The opinions expressed in this presentation are those of the presenters,in their individual capacities, and not necessarily those of Discover.
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 3
AIR9 Analytics Environment
DISCOVER
FINANCIAL SERVICES
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 4
AIR9 Analytics Environment
Discover is one of the largestdirect banks in the United States, offering a broad array of products, including credit cards, personal loans, student loans, deposit products, and home equity loans.
The Discover brand is knownfor rewards, services, and value.
Across all direct banking products,Discover seeks to help customersmeet their financial needs, andachieve brighter financial futures.
Credit Cards▪ $144Bn Card Sales Volume▪ $74Bn in Credit Card Receivables
Digital Banking▪ $52Bn+ Consumer Deposits
▪ $10Bn Private Student Loans▪ $8Bn Personal Loans
4
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 5
AIR9 Analytics Environment
AIR9
What is it?
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 6
AIR9 Analytics Environment
The Elevator Pitch
AIR9 was built for data scientists who are frustratedby the fragmentation of today’s analytics environmentsand the difficulty in accessing the latest toolsand consistent data sets.
AIR9 provides a mechanism to jump-start analytics work by combining ourcloud-scale data warehouse with the freedom of on-demand, scalable, and secure analytics environments.
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 7
AIR9 Analytics Environment
The AIR9 platform allows for increased agility, decreased long-term costs, and faster time-to-value.
Past Challenges Future Opportunities
Fragmented Environment/Restricted Capabilities
Needs of business users for the latest tools and compute capabilities
cannot be achieved in our current environment
— “Frequently running out of space”
— “Takes many months to deploy a model”
— “Inconsistent data between environments”
— “Can’t install the packages I need
or get access to the latest tools”
Faster Access to Tools and Updated Environments
Gone are the days waiting weeks or longer for the latest versions
of tools; launch your own, personalized environments in minutes.
Freedom to Innovate
New opportunities to use different technologies
with the ability to evaluate without a long-term
commitment to infrastructure or specific tools
Collaboration and Centralization
Save and share datasets between different analytics tools
(H2O/Python/R/SAS); a single, well-curated website with
all documentation and tools displayed together.
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 8
AIR9 Analytics Environment
▪ Self-service model training
environments with choice of
hardware and tools
▪ Extract datasets from warehouse and
use them interactively
▪ Evaluate model performance
well as model explainability
▪ Model/code promotion workflow and
deployment pipeline
AIR9: The Intersection of Code–Data–Compute
Data
▪ Providers
▪ Team Folders
▪ Drop Bucket
Code
▪ Github
Compute
▪ Hardware
▪ Software
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 9
AIR9 Analytics Environment
TECHNICAL DESIGN
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 10
AIR9 Analytics Environment
First level
Second level
March 1st 2019
▪ Scalable AIR9 Application
▪ Operational SQL database
▪ 6 REST API Javaservices on OCP
▪ Jenkins server with 6 jobs
▪ 10 AWS Lambda functions and 3 AWS Step Machines
▪ 2 AWS SNS topics, EC2 instance hosting Python REST API service
▪ Operational SQL database
“Make everything as “simple as possible, “but not simpler.”
—Albert Einstein
October 1st 2019
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 11
AIR9 Analytics Environment
Architecture and Integration
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 12
AIR9 Analytics Environment
▪ Users own and manage their container’s lifecycle.
▪ Auto-termination of dormant containers.
▪ Prometheus and Instanato monitor and generate container-usage metrics.
AIR9 Container Lifecycle
Submitted
Error
Running
Terminate
Suspend
Terminate
Failed
Success
Terminate
Suspend
Resume
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 13
AIR9 Analytics Environment
AIR9 modeling environments are
over-subscribed on each node.
Each environment is only guaranteed
10% of the requested
CPU and RAM, and allowed to burst
up to the request limit.
▪ Workloads are very bursty: short-
lived and resource intensive
▪ User base is global and distributed,
pods may be idle during regional off-
hours
▪ Lower pod request values
ensure guarantees closer to the
actual node resource utilization
AIR9 Resource Allocation
SAS Container
Dedicated CPU Container CPU Burstable Limit
Dedicated RAM Container RAM Burstable Limit
RStudio Container
Dedicated CPU Container CPU Burstable Limit
Dedicated RAM Container RAM Burstable Limit
H2O Container
Dedicated CPU Container CPU Burstable Limit
Dedicated RAM Container RAM Burstable Limit
K8S Node
Free Free
SAS SAS
RStudio RStudio
H2O H2O
CPU RAM
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 14
AIR9 Analytics Environment
BUSINESS VALUE
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 15
AIR9 Analytics Environment
Value to the Analysts/Scientists
▪ Ability to provision environments on
Demand
▪ Collaboration within multiple tools
▪ Ability to download latest packages
▪ Centralized community
▪ Centralized Help
▪ Solve size limitations
Guiding Principles
Easy UI/UXAbstract
Tech Complexity
Centralize Help
Latest Versions
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 16
AIR9 Analytics Environment
We are seeing techniques being
tested and evaluated that had
not existed previously.
▪ Distributed machine learning
with H2O & Sparking Water
▪ Neural networks predicting potential
incidents within our customer web
and mobile journeys.
▪ LSTM models to flag consumer calls
on certain topics
▪ Analyze trained models
for fairness and bias
AIR9 is Driving Innovation
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 17
AIR9 Analytics Environment
There is no central analytics community
at Discover.
▪ Siloed teams, making collaboration
difficult.
▪ Centralizing capabilities onto a single
platform allow users to communicate
and help one another.
▪ A shared and consistent experience
helps isolated teams feel included.
AIR9 is Driving Collaboration
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 18
AIR9 Analytics Environment
AIR9 has created efficiencies
in model training and development
as well as driving tangible
business results.
▪ Delivering updated
tools/environments in hours, not
weeks
▪ Archiving live environments for
compliance and
regulatory needs
▪ Faster model iteration and
more diverse AI/ML technologies
AIR9 Wins
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 19
AIR9 Analytics Environment
Onboarding users to the AIR9
platform is a dedicated, multi-day event
that partners IT teams
with our business teams.
Over 60% of analytics users onboarded
to date.
▪ Platform and Tool Training
▪ Identification and tracking
of any missing capabilities
▪ Documentation is programmatically
updated (via GitHub/Sphinx)
User Adoption and Platform Growth
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
January February March April May June July August September October
User Adoption (% of Analytics Users)
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 20
AIR9 Analytics Environment
APPENDIX
©2019 Discover Financial Services • Confidential and Proprietary • Do Not Copy or Distribute | 21
AIR9 Analytics Environment