how to sell an azure data lake project for your … · nosql/ms-sql 2. what is a data lake?...
TRANSCRIPT
HOW TO SELL AN AZURE DATA LAKE PROJECT FOR YOUR ORGANIZATION’S BENEFIT
Presented by:
Victor Karamalis
TTI Corp.
WHO I AM
20 Years on a broad range of Sectors in Information Technology Services
Education & Affiliations:
Master of Science in Management & Systems (NYU)
Project Management Professional (PMI.ORG)
Data Management International (DAMA.ORG)
Fellow, Royal Society of Arts, Manufactures & Commerce (thersa.org)
Large Scale Artificial Intelligence Projects with Multi-National Companies
System and Data Integrations in Enterprise ERP & IIoT
Innovative Proof of Concepts (PoC) with Formal Sponsor Support
Product Management with multiple global teams
Past Contributor in Leading Silicon Valley Tech Blogs
WHAT WE WILL COVER1. DATA LAKE DESIGN
LEAN DATA GOVERNANCE MACHINE LEARNING NOSQL/MS-SQL
2. What is a Data Lake? Explanation of Azure Data Lake Storage GEN 2
3. USE CASE SCENARIO
4. AZURE Data Lake IaaS VS. PaaS1. IaaS2. PaaS
5. EXAMPLE IaaS Architecture
6. DEMONSTRATION BASED ON BASIC ACCOUNT SUBSCRIPTION
7. LESSONS LEARNED
THE ROSETTA STONE @ THE BRITISH MUSEUM
YOUR ORGANIZATION
AS A ‘LEAN STARTUP’
“Somebody has a theory about what’s going to work and what the benefit will be. We don’t measure it. We don’t actually see if it did what we thought it was going to do. And we keep doing it. And then it doesn’t work, so we do something else. And then we layer on program after program that doesn’t actually meet its objectives. And if we actually brought in the mind-set that said, “No, actually we’re going to figure out if we actually accomplish what we set out to accomplish; and if we don’t, we’re going to change it,” that would be huge.”
-Eric Ries, Lean Startup
DATA MANAGEMENT LESSONS
Data Governance: must support business strategy and goals. An organization’s business strategy and goals inform both the enterprise data strategy and how data governance and data management activities need to be operationalized in the organization.
Must contribute to the organization by identifying and delivering on specific benefits
Formalized via Project Charter
Enterprise Data Architecture: Enterprise Data Model (EDM)
Data Flow Design
Maintain compliance throughout data lifecycle
HIPAA
GDPR
DPA UK
PIPEDA (Canada)
MACHINE LEARNING IN A NUTSHELL
Requires Data Scientists to teach system how to learn
Good performance is difficult or infeasible using traditional programming techniques
Complete Logic or Formula to implement solution is not known or does not currently exist
Significant Data size to Compute.
Business Questions Answered Which Products are likely to be bought
together? Collaborative Filtering
How much, what will be the number of..? Regression
Who are my best customers? Clustering
What will be price of stock in a month? Gradient Boosted Tree
Is Fraud Occurring? Decision Tree
Is that image a known intruder? Support Vector Machine (aka, supervised
learning)
AI VS. ML VS. DLEXAMPLE OF RECOGNIZING A PICTURE
Artificial Intelligence
Requires a programmer(s) to write all the code required for a computer to recognize a picture of an object (e.g. cat).
Machine Learning
Requires data scientists to teach the system how to learn what a cat looks like by feeding images and correcting its analysis until the system becomes accurate.
DEEP LEARNING
Divide the task of recognizing an object into different layers1st layer of the algorithm earns to recognize cat body part2nd layer learns another cat body partFinal connects previous layers
MACHINE LEARNING ALGORITHMS
NOSQL/MS-SQL MIGRATION OPTIONS
NO-SQL
SPARK
COUCHDB
HADOOP
COSMOS
RDBMS/SQL
AZURE SQL
MS-SQL SERVER
ORACLE
SAP
WHAT IS A DATA LAKE?
A data lake is an organic store of data without regard for the perceived value or structure of the data unlike a data warehouse
Unstructured
Semi-structured
Structured
A Data Warehouse is a highly structured store of data.
Data Lakes Market segment by Type:
Data Discovery (Insight)
Data Integration and Management
Data Lakes Analytics
Data Visualization
WHAT MAKES A DATA LAKE SO GREAT?
Massive Scale Granular, Multi-layered Security
Optimized for Maximum
Performance
Integration Friendly
Cost Effectiveness
Petabyte Scale, data accessible
everywhere, growth on demand
Granular Security & Protection against
accidental data loss
Extremely fast job execution
Supports multiple methods of data
ingress, processing, egress,
and visualization
Cloud Economic Model with the
ability to intelligently
manage costs
RICH DATA MANAGEMENT & GOVERNANCE(Standards Compliant & Available Everywhere)
A “NO COMPROMISES” DATA LAKE
A Secure, performant, massively scalable Data Lake Storage that brings the cost & scale of object storage together with the performance and analytics feature set of data lake storage
Secure
Manageable
Fast
Scalable
Cost Effective
Integration Ready
AZURE DATA LAKE STORAGE GEN 2: ADLS GEN 2
SECURE MANAGAEABLE FAST SCALABLE COST EFFECTIVE INTEGRATION READY
Support for fine-grained Access Control Lists, Protecting data at File & Folder Level
Automated Lifecycle Policy Management
Atomic File Operations Means Jobs complete faster
No Limits on Data Store Size
Object Store Pricing Levels
Optimized for Spark & Hadoop Analytic Engines
Multi-Layered protections via at-rest storage service encryption *Azure Active Directory Integration
Object Level Tiering
Global Footprint(54 Regions)Including Government Clouds
File System operations minimize transactions required for job completion
Tightly integrated with Azure end to end Analytics Solutions
GEN 1 STORAGE DIFFERENCES
Blob Storage
Large Partner Ecosystem
Global Scale- All 57 Regions
Durability Options
Tiered – Hot/Cool/Archive
Cost Efficient
Data Lake Store
Built for Hadoop
Hierarchical Namespace
ACL, AAD, & RBAC
Performance Tuned for Big Data
Very High Scale Capacity & Throughput
DATA LAKE DESIGN
Cloud/On-premises, Hybrid Cloud, Multi-Cloud (AZURE)
Storage (AZURE SQL DATA BLOB Storage)
Processing (AZURE DATA LAKE)
Data Management (AZURE DATA STORE)
Advanced Analytics Enterprise Reporting Apps (Power Bi)
USE CASE SCENARIO
BUSINESS CONSIDERATIONS
SPONSOR/MANAGEMENT SUPPORT
AUGMENT DEFINED BUSINESS INSIGHTS
TIME TO MARKET FOR KEY INSIGHTS (aka AGILITY)
BUDGET CONSIDERATIONS
TECHNICAL SKILLS CONSIDERATIONS
MINIMAL DEPENDENCE ON IT FOR DRASTIC CHANGES
RIGIDITY OF SINGLE DATA MODEL
ABILITY TO HANDLE STREAMING DATA
SCALABILITY
MINIMAL SIZE FOR A BUSINESS ADLS PROJECT TEAM
Project Manager
Solution Architect
Data Engineer/Lead
Data Scientist
IAAS ADLS VS. PAAS: ADLS (GEN 2)
INGESTING DATA FROM VARIOUS SOURCES
MIGRATE FROM EXISTING ON-PREMISE DATA WAREHOUSE MOBILE DATA
ERP DATA WAREHOUSE
APP DATA
SENSOR DATA
MASTER DATA
PROGRAMATIC
MACHINE LEARNING SERVICES WITH LITTLE OR NO-CODE Run & Monitor Experiments
Register Models
Build Docker Images
Deploy Models
Create Pipeline
DEMONSTRATION ON AZURE FOR ADLS GEN 2
LESSONS LEARNED
The Soft Skills
Get Buy-In from Technical Staff (IT)
Security Policies are understood and use approved VM’s
Ensure Business/Technical Stakeholders are informed regularly.
The Technical Matters
ADDRESS DATA GOVERNANCE INTEGRITY SECURITY
ACCESS ONLY DATA YOU NEED REGULATIONS May Add Cost (Transport & Store)
ADD ALERTS MONITORING FOR ANY & ALL VM’S + SERVICES
AFTER FINISHED SHUT DOWN V-NET RESOURCE GROUPS UPDATED
AUTHORIZED AD USERS
POLICIES
IMPORTANT URL’S
Azure updates: https://azure.microsoft.com/en-us/updates/
Azure Blogs: https://azure.microsoft.com/en-us/blog/
Azure Data Lake Storage Gen2:
https://azure.microsoft.com/en-us/blog/under-the-hood-performance-scale-security-for-cloud-analytics-with-adls-gen2/
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs#blob-storage-and-azure-data-lake-gen2
SPARK to SQL SERVER: https://docs.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15
AZURE V-NET: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview
AZ 300 Ref Exam: https://www.microsoftpressstore.com/store/exam-ref-az-300-microsoft-azure-architect-technologies-9780135802540
THANK YOU FOR COMING!
Contact information:
P: 954-707-7545