clean your data swamp by migration off hadoop
TRANSCRIPT
![Page 1: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/1.jpg)
Clean Your Data SwampBy Migration off Hadoop
![Page 2: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/2.jpg)
Speaker
Ron GuerreroSenior Solutions
Architect
![Page 3: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/3.jpg)
Agenda
● Why modernize?● Planning your migration off of Hadoop ● Top migration topics
![Page 4: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/4.jpg)
Why migrate off of Hadoop and onto Databricks?
![Page 5: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/5.jpg)
History of Hadoop
● Created 2005● Open Source distributed processing and storage
platform running on commodity hardware● Originally consisted of HDFS, and MapReduce, but
now incorporates numerous open source projects (Hive, HBase, Spark)
● On-prem and on the cloud
![Page 6: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/6.jpg)
COMPLEX FIXED
Today Hadoop is very hard
● Many tools: Need to understand multiple technologies.
● Real-time and batch ingestion to build AI models requires integrating many components.
Slow Innovation
● 24/7 clusters.● Fixed capacity: CPU
+ RAM + Disk.● Costly to upgrade.
Cost Prohibitive
MAINTENANCE INTENSIVE
● Hadoop ecosystem is complex and hard to manage that is prone to failures.
Low Productivity
X
![Page 7: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/7.jpg)
Enterprises Need a ModernData Analytics Architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive innovation
![Page 8: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/8.jpg)
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering BI & SQL Analytics
Real-time Data Applications
Data Science & Machine Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
![Page 9: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/9.jpg)
Planning your migration off of Hadoop and onto Databricks
![Page 10: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/10.jpg)
Migration Planning● Internal Questions● Assessment● Technical Planning● Enablement and Evaluation● Migration Execution
![Page 11: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/11.jpg)
Migration PlanningInternal Question● why?● who?● desired start and end dates● internal stakeholders● cloud strategy
![Page 12: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/12.jpg)
Migration PlanningAssessment● Environment inventory
○ compute, data, tooling● Use case prioritization● Workload analysis● Existing TCO● Projected TCO● Migration timelines
![Page 13: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/13.jpg)
Migration PlanningTechnical Planning● Target state architecture● Data migration● Workload migration
○ Lift and shift, transformative, hybrid● Data governance approach● Automated deployment● Monitoring and Operations
![Page 14: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/14.jpg)
Migration PlanningEnablement and Evaluation● Workshops,Technical deep dives● Training● Proof of technology / MVP
○ Validate assumptions and designs
![Page 15: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/15.jpg)
Migration PlanningMigration Execution● Environment Deployment● Iterate of use cases
○ Data Migration○ Workload Migration○ Dual Production Deployment - Old and New○ Validation○ Cut-over and Decommission of Hadoop
![Page 16: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/16.jpg)
Top Migration Topics
![Page 17: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/17.jpg)
Key Areas of Migration1. Administration2. Data Migration3. Data Processing4. Security & Governance5. SQL and BI Layer
![Page 18: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/18.jpg)
Administration
![Page 19: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/19.jpg)
Hadoop Ecosystem to Databricks ConceptsHadoop
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Worker (Executor)
ccc
c
c
c2x12c = 24c compute
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Worker (Executor)
ccc
c
c
c2x12c = 24c compute
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Driver
ccc
c
c
c2x12c = 24c compute
...
Node 1 Node 2 Node N
Hive Metastore
Hive Server
Impala(LoadBalancer)
HBaseAPI
SentryTable Metadata + HDFS ACLs
JDBC/ODBC
Node makeup▪ Local disks▪ Cores/Memory carved to services▪ Submitted jobs compete for resources▪ Services constrained to accommodate
resources
Metadata and Security▪ Sentry table metadata permissions combined
with syncing HDFS ACLs OR
▪ Apache Ranger, policy based access control
Endpoints▪ Direct Access to HDFS / Copied dataset
▪ Hive (on MR or Spark) accepts incoming connections
▪ Impala for interactive queries▪ HBase APIs as required
RangerPolicy based access control
OR
![Page 20: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/20.jpg)
Hadoop Ecosystem to Databricks ConceptsHadoop
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Worker (Executor)
ccc
c
c
c2x12c = 24c compute
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Worker (Executor)
ccc
c
c
c2x12c = 24c compute
HDFSc
disk
1di
sk2
disk
3di
sk4
disk
5di
sk6
... disk
N
YARN
Impala
HBase
cc
cc
c MR mapperc MR mapperc MR mapper
cSpark Worker (Executor)
ccc
c MR mappercSpark Worker (Executor)
ccc
cSpark Driver
ccc
c
c
c2x12c = 24c compute
...
Node 1 Node 2 Node N
Hive Metastore
Hive Server
Impala(LoadBalancer)
HBaseAPI
Sentry/RangerTable Metadata + HDFS ACLs
Hive Metastore(managed)
Databricks
SQL Endpoint
JDBC/ODBC
High Conc. Cluster SQL Analytics
CosmosDB/DynamoDB/Keyspaces
Object Storage
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
Spark ETL(Batch/Streaming)
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
SQL Analytics
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
ML Runtime
Table ACLs
Object Storage ACLs
Ephemeral Clusters for All-purpose or Jobs
JDBC/ODBC
![Page 21: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/21.jpg)
Hadoop Ecosystem to Databricks Concepts
Hive Metastore(managed)
Databricks
SQL EndpointHigh Conc. Cluster SQL Analytics
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
Spark ETL(Batch/Streaming)
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
SQL Analytics
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Driver
ccc
cSpark Worker (Executor)
ccc
DeltaEngine
cSpark Worker (Executor)
ccc
DeltaEngine Databricks Cluster
ML Runtime
Table ACLs
Ephemeral Clusters or long running for All-purpose or Jobs
JDBC/ODBCNode makeup▪ Each Node (VM), maps to single Spark
Driver/Worker▪ Cluster of nodes completely isolated from other
jobs/compute▪ De-coupled compute and storage
Metadata and Security▪ Managed Hive metastore (other options
available)▪ Table ACLs (Databricks) and Object Storage
permissions
Endpoints▪ SQL endpoint for both advanced analytics and
simple SQL analytics
▪ Code access to data - Notebooks▪ HBase → maps to Azure CosmosDB, AWS
DynamoDB/Keyspaces (non-Databricks solution)
Object Storage Object Storage ACLs
CosmosDB/DynamoDB/Keyspaces
![Page 22: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/22.jpg)
Demo - Administration
![Page 23: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/23.jpg)
Data Migration
![Page 24: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/24.jpg)
Data Migration
- On-premise block storage.- Fixed disk capacity. - Health checks to validate data Integrity.- As data volumes grow, must add more nodes to cluster and rebalance data.
MIGRATE
- Fully managed cloud object storage.- Unlimited capacity. - No maintenance, no health checks, no rebalancing. - 99.99% availability, 99.9999999% durability.- Use native cloud services to migrate data. - Leverage partner solutions:
![Page 25: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/25.jpg)
Data MigrationBuild a Data Lake in cloud storage with Delta Lake
● Open source and uses Parquet file format. ● Performance: Data indexing → Faster queries. ● Reliability: ACID Transactions → Guaranteed data integrity. ● Scalability: Handle petabyte-scale tables with billions of partitions and files at ease. ● Enhanced Spark SQL: UPDATE, MERGE, and DELETE commands.● Unify Batch and Stream processing → No more LAMBDA architecture. ● Schema Enforcement: Specify schema on write. ● Schema Evolution: Automatically change schemas on the fly. ● Audit History: Full audit trail of the changes.● Time Travel: Restore data from past versions. ● 100% Compatible with Apache Spark API.
![Page 26: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/26.jpg)
Start with Dual ingestion
● Add a feed to cloud storage
● Enable new use cases with new data
● Introduces options for backup
![Page 27: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/27.jpg)
How to migrate data
● Leverage existing Data Delivery tools to point to cloud storage
● Introduce simplified flows to land data into cloud storage
![Page 28: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/28.jpg)
How to migrate data ● Push the data
○ DistCP ○ 3rd Party Tooling○ In-house frameworks○ Cloud Native - Snowmobile , Azure Data Box, Google Transfer Appliance○ Typically easier to approve (security)
● Pull the data○ Spark Streaming○ Spark Batch
■ File Ingest■ JDBC
○ 3rd Party Tooling
![Page 29: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/29.jpg)
How to migrate data - Pull approach● Set up connectivity to On Premises
○ AWS Direct Connect○ Azure ExpressRoute / VPN Gateway○ This may be needed for some use cases
● Kerberized Hadoop Environments○ Databricks clusters initialization scripts
■ Kerberos client setup■ krb5.conf, keytab■ kinit()
● Shared External Metastore ○ Databricks and Hadoop can share a metastore
![Page 30: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/30.jpg)
Demo - Databricks Pull
![Page 31: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/31.jpg)
Data Processing
![Page 32: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/32.jpg)
Technology Mapping
![Page 33: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/33.jpg)
Migrating Spark Jobs
● Spark versions
● RDD to Dataframes
● Changes to submission
● Hard coded references to hadoop environment
![Page 34: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/34.jpg)
Converting non-Spark workloads
● MapReduce
● Sqoop
● Flume
● Nifi Considerations
![Page 35: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/35.jpg)
Migrating HiveQL
● Hive queries have high compatibility
● Minor changes in DDL
● Serdes, and UDFs
![Page 36: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/36.jpg)
Migration Workflow Orchestration
● Create Airflow, Azure Data Factory, or other, equivalents
● Databricks REST APIs allows integration to any Scheduler
![Page 37: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/37.jpg)
Automated Tooling
● MLens
○ PySpark○ HiveQL○ Oozie to Airflow, Azure Data Factory (roadmap)
![Page 38: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/38.jpg)
Security and Governance
![Page 39: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/39.jpg)
Security and GovernanceAuthentication Authorization Metadata Management
- Single Sign On (SSO) with SAML 2.0 supported corporate directory.
- Access Control Lists (ACLs) for Databricks RBAC. - Table ACLs - Dynamic Views for Column/Row permissionons- Leverage cloud native security: IAM Federation and AAD passthrough. - Integration with Ranger an Immuta for more advanced RBAC and ABAC.
- Integration with 3rd party services.
Amazon Glue
![Page 40: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/40.jpg)
Pivacera
![Page 41: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/41.jpg)
Migrating Security Policies from Hadoop to Databricks
Enabling enterprises to responsibly use their data in the cloudPowered by Apache Ranger
![Page 42: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/42.jpg)
HADOOP ECOSYSTEM● 100s and 1000s of tables in
Apache Hive● 100s of policies in Apache
Ranger● Variety of policies. Resource
Based, Tag Based, Masking, Row Level Filters, etc.
● Policies for Users and Groups from AD/LDAP
![Page 43: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/43.jpg)
PRIVACERA AND DATABRICKS
Hive MetaStore MetaStore
Dataset
Schema
Policies
![Page 44: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/44.jpg)
SEAMLESS MIGRATION
INSTANTLY TRANSFER
YEARS OF EFFORT
INSTANTLY IMPLEMENT THE SAME
POLICIES IN DATABRICKS AS ON-PREM
![Page 45: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/45.jpg)
● Richer, deeper, and more robust Access Control● Row/Column level access control in SQL● Dynamic and Static data de-identification● File level access control for Dataframes, object level access● Read/Write operations supported
Object Store(S3/ADLS)
Privacera+Databricks
S3 - Bucket Level
Y
S3 - Object Level
Y
ADLS Y
Privacera Value Add - Enhancing Databricks Authorization
Spark SQL and R Privacera +Databricks
Table Y
Column Y
Column Masking Y
Row Level Filtering Y
Tag Based Policies Y
Attribute based policies Y
Centralized Auditing Y
![Page 46: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/46.jpg)
Databricks SQL/Python Cluster
Spark Driver Ranger Plugin
Spark Executors
Spark Executors Ranger Policy Manager
Privacera Portal
Privacera Audit Server
DB Solr
Apache Kafka
Splunk
Cloud Watch
SIEM
Privacera Cloud
Spark SQL and/or Spark Read/Write
Privacera Anomaly Detection and Alerting
Databricks Cluster
Privacera Discovery
Business User
Admin User
Privacera Approval Workflow
AD/LDAP3rd Party Catalog
![Page 47: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/47.jpg)
![Page 48: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/48.jpg)
SQL and BI
![Page 49: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/49.jpg)
What about the SQL CommunityHadoop
● HUE ○ Data browsing○ SQL Editor○ Visualizations
● Interactive SQL○ Impala○ Hive LLAP
Databricks
● SQL Analytics Workspace○ Data Browser○ SQL Editor○ Visualizations
● Interactive SQL○ Spark optimizations - Adaptive Query Execution○ Advanced Caching○ Project Photon○ Scaling cluster of clusters
![Page 50: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/50.jpg)
SQL & BI LayerOptimized SQL and BI
Performance BI Integrations Tuned
- Fast queries with Delta Engine on Delta Engine.- Support for high-concurrency with auto-scaling clusters.- Optimized JDBC/ODBC drivers.
- Optimized and tuned for BI and and SQL out of the box.
Compatible with any BI client and tool that supports Spark.
![Page 51: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/51.jpg)
Vision
Give SQL users a home in DatabricksProvide SQL workbench, light dashboarding, and alerting capabilities
Great BI experience on the data lakeEnable companies to effectively leverage the data lake from any BI tool without having to move the data around.
Easy to use & price-performantMinimal setup & configuration. Data lake price performance.
![Page 52: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/52.jpg)
SQL-native user interface for analysts
▪ Familiar SQL Editor▪ Auto Complete▪ Built in visualizations▪ Data Browser
▪ Automatic Alerts ▪ Trigger based upon values▪ Email or Slack integration
▪ Dashboards▪ Simply convert queries to
dashboards ▪ Share with Access Control
![Page 53: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/53.jpg)
Built-in connectors for existing BI tools
Other BI & SQL clients that support
▪ Supports your favorite tool▪ Connectors for top BI & SQL clients▪ Simple connection setup▪ Optimized performance
▪ OAuth & Single Sign On▪ Quick and easy authentication
experience. No need to deal with access tokens.
▪ Power BI Available now▪ Others coming soon
![Page 54: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/54.jpg)
Performance Delta Metadata Performance
Improved read performance for cold queries on Delta tables. Provides interactive metadata performance regardless of # of Delta tables in a query or table sizes.
New ODBC / JDBC Drivers
Wire protocol re-engineered to provide lower latencies & higher data transfer speeds:
▪ Lower latency / less overhead (~¼ sec) with reduced round trips per request▪ Higher transfer rate (up to 50%) using Apache Arrow▪ Optimized metadata performance for ODBC/JDBC
APIs (up to 10x for metadata retrieval operations)
Photon - Delta Engine[Preview]
New MPP engine built from scratch in C++. Vectorized to exploit data level parallelism and instruction-level parallelism. Optimized for modern structured and semi-structured workloads.
![Page 55: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/55.jpg)
Summary
![Page 56: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/56.jpg)
It all starts with a plan● Databricks and are partner community can help you
○ Assess○ Plan○ Validate○ Execute
![Page 57: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/57.jpg)
Considerations for your migration to Databricks● Administration ● Data Migration● Data Processing● Security & Governance● SQL and BI Layer
![Page 58: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/58.jpg)
Next Steps
![Page 59: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/59.jpg)
Next Steps● You will receive a follow up email from our teams
● Let us help you with your Hadoop Migration Journey
![Page 60: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/60.jpg)
Follow up materials - Useful links
![Page 61: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/61.jpg)
Databricks Reference Architecture
![Page 62: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/62.jpg)
Databricks Azure Reference Architecture
![Page 63: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/63.jpg)
Databricks AWS Reference Architecture
![Page 64: Clean Your Data Swamp By Migration off Hadoop](https://reader030.vdocument.in/reader030/viewer/2022012606/619a9ddbdd1665532a66b1c0/html5/thumbnails/64.jpg)
Demo