introduction | to microsoft sql server big data clusters › files › summit › session-assets ›...
TRANSCRIPT
![Page 1: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/1.jpg)
Introduction | to Microsoft SQL Server Big Data Clusters
Buck Woody – Applied Data ScientistMicrosoft Data Platform Team
June 2019
![Page 2: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/2.jpg)
Data GrowthComputing and Storage advances impact data collection abilities
Computing and Storage technologies allow greater data collection points
They also allow longer historical data storage, and as time goes on become part of that storage lineage
Walmart is a classic example of data proliferation and leverage
![Page 3: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/3.jpg)
Use-CasesEvery Industry classification benefits from Big Data, Retail and Finance leads the way
Industry Sector Primary Use-Cases
Retail Demand prediction
In-store analytics
Supply chain optimization
Customer retention
Cost/Revenue analytics
HR analytics
Inventory control
Finance Cyberattack Prevention
Fraud detection
Customer segmentation
Market analysis
Risk analysis
Blockchain
Customer retention
Healthcare Fiscal control analytics
Disease Prevention prediction and classification
Clinical Trials optimization
Patient load analysis
Episode analytics
Public Sector Revenue prediction
Education effectiveness analysis
Transportation analysis and prediction
Energy demand and supply prediction and control
Defense readiness predictions and threat analysis
Manufacturing Predictive Maintenance (PdM)
Anomaly Detection
Pattern analysis
Agriculture Food Safety analysis
Crop forecasting
Market forecasting
Pipeline Optimization
![Page 4: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/4.jpg)
Scale-Out Processing
![Page 5: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/5.jpg)
Scaled Processing and Scaled StorageThe foundations of scale
HadoopSpark
![Page 6: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/6.jpg)
VirtualizationHardware Abstraction
Building on hardware, you can create a complete “PC” on top of a Hypervisor layer, which abstracts out the hardware. You still own the Operating System and up
This allows for scale by ring-fencing OS-level dependencies
![Page 7: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/7.jpg)
ContainersAbstracting the OS, Allowing complete portability
Containers go one level further than the Hypervisor, and focusing on binaries and applications
Storage and networking are a consideration
Scale is achieved through multiple containers
![Page 8: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/8.jpg)
Node
NodeNode
Node
Node
Node
Node
kube-proxykubelet
Pod
Pod Pod
• Container
• Pod
• Node
• Cluster
Volume
• Routing
Ro
ute
![Page 9: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/9.jpg)
Master
Web Tier
Web Tier
Web Tier
Business Logic
Business Logic
Data Tier
Data Tier Data Tier
Data Tier
Data Tier
![Page 10: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/10.jpg)
LinuxWindowsSQL Server
ContainersSQL Server SQL Server
On Premises Public/Private cloud
Hybrid
![Page 11: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/11.jpg)
SQL Server 2019 Big Data Cluster – Complete Architecture
![Page 12: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/12.jpg)
Master
Web Tier
Web Tier
Web Tier
Business Logic
Business Logic
Data Tier
Data Tier Data Tier
Data Tier
Data Tier
![Page 13: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/13.jpg)
LOB AppsApplication Calls to SQL Server Master Instance. Relational, multi-type, Graph, and ML features supported. No code change.
![Page 14: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/14.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
MasterSQL Server
Master
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 15: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/15.jpg)
SQL Server 2019 Big Data – Data Virtualization
![Page 16: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/16.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
MasterSQL Server
Master
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 17: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/17.jpg)
HDFS
Compute Pool
NoSQL
Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables, through the Compute Pool using PolyBase Connectors at the Source
![Page 18: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/18.jpg)
RDBMS
NoSQL
Scale-Out
PolyBase Connector
PolyBase Connector
PolyBase Connector
PolyBaseExternal Table
![Page 19: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/19.jpg)
SQL Server 2019 Big Data Cluster – Data Mart
![Page 20: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/20.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
MasterSQL Server
Master
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 21: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/21.jpg)
HDFS
Compute Pool
Data Pool
NoSQLData Persistence Using Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables, through the Compute Pool using PolyBase Connectors at the Source. Results are stored in the Shards of the Data Pool.
![Page 22: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/22.jpg)
RDBMS
Cosmos DB
HDFS
(Shards)
PolyBase Connector
PolyBase Connector
PolyBase Connector
SQL Server Data Pool
Compute Pool
![Page 23: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/23.jpg)
ExampleSQL Server Big Data Cluster – Data Mart
![Page 24: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/24.jpg)
SQL Server 2019 Big Data Cluster – Data Lake, Machine Learning and Spark
![Page 25: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/25.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
MasterSQL Server
Master
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 26: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/26.jpg)
Storage Pool
HDFS
Compute Pool
App Pool
Data Pool
NoSQL
Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables through the Compute Pool to the Data Pool
Scaled Data AnalysisData Mart Scale-out calls through SQL Server Master Instance using External Tables into Data Pool. Direct calls to a Data Lake (HDFS) using the Storage Pool.
Data ScienceData Engineering and Pipelines for Models with big data using Notebooks and other tools through to Spark, ingesting and processing data using the Storage Pool
AI EnablementPrediction and Classification Scoring to AI apps using the App Pool
![Page 27: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/27.jpg)
ExampleSpark Query Notebook
![Page 28: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/28.jpg)
SQL Server 2019 Big Data Cluster – Tools, Management and Monitoring
![Page 29: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/29.jpg)
ExampleSQL Server Big Data
Cluster – Management and Monitoring
![Page 30: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/30.jpg)
Takeaways• SQL Server 2019 Big Data cluster includes SQL Server together with the HDFS and Spark Compute
engine as one package for big data processing, Machine Learning and AI
• Spark is a distributed compute engine that provides a unified framework for E2E big data processing pipeline including Machine learning and AI
• You can use SQL Server 2019 to create a secure, hybrid, machine learning architecture starting with data preparation, training a machine learning model, operationalizing your Model and using it for scoring
• Go Do > Practice with installing, configuring, and operating SQL Server 2019
• Go Do > Download this deck and practice a demo on Big Data Clusters on SQL Server
• Go Do > Follow a complete workshop
![Page 31: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/31.jpg)
Resources
Official documentation – aka.ms/bdc
In-depth training - aka.ms/sqlworkshops
![Page 32: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/32.jpg)
![Page 33: Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck](https://reader035.vdocument.in/reader035/viewer/2022070821/5f1e9c390cbbe90b4a6d4fbe/html5/thumbnails/33.jpg)