getting it right exactly once: principles for streaming architectures
TRANSCRIPT
![Page 1: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/1.jpg)
Getting It Right Exactly Once:Principles for Streaming ArchitecturesDarryl Smith, Chief Data Platform Architect and Distinguished Engineer, Dell Technologies
September 2016 | Strata+Hadoop World, NY
![Page 2: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/2.jpg)
2
Getting Started I’m Darryl Smith
• Chief Data Platform Architectand Distinguished EngineerDell Technologies
Agenda• Real-Time And The Need For Streaming• Adding Real-Time And Streaming To The Data Lake• Results, Plans, Lessons Learned• Demonstration
![Page 3: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/3.jpg)
3
Trickle, Flood, or Torrent…
Streaming is aboutcontinuous data motion,
more than speedor volume
![Page 4: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/4.jpg)
4
The Conversation Around Streaming
Website and Mobile Application Logs
Internet of ThingsSensors
![Page 5: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/5.jpg)
5
The Enterprise Reality
Batch > Real-Time > StreamingEnterprise Opportunities
Immediate Business Advantage
Website and Mobile Application Logs
Internet of ThingsSensors
![Page 6: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/6.jpg)
6
The Enterprise Streaming Play
Moving from batch to real-time streamsavoids surges, normalizes compute,
and drives value
![Page 7: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/7.jpg)
7
Real time and the need for streaming
![Page 8: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/8.jpg)
8
Drive DellEMC towards a Predictive Enterprise via
intelligent data driving agility, increasing revenue and
productivity resulting in a competitive advantage
Analytics Vision
![Page 9: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/9.jpg)
9
Need to use new data for competitive advantage
• Volume, Variety and Velocity Leverage near real time and
streaming data sets to optimize predictions
• Make faster, better decisions Cost-effectively scale to improve
query and load performance Put the data in the hands of the
business
Becoming An Analytical Enterprise
DRIVE COMPETITIVE ADVANTAGE
COST-EFFECTIVELY SCALE
DATA ACCESS BY BUSINESS
NEAR REAL-TIME ANALYTICS
![Page 10: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/10.jpg)
10
Problem StatementTeams do not have access to maintenance renewal quotes in the timeframes or the degree of quality which they need for Tech Refresh and Renewal sales.
Desired OutcomeImplement a cost-effective, real-time solution that improves productivity and gives confidence to produce desired outcomes efficiently.
Scoping The Business Objectives
![Page 11: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/11.jpg)
11
Business Drivers
CURRENT REALITY VISION FOR THE FUTURE
TO REALIZE THIS VISION:IMPLEMENT
CALM SOLUTION
PHASES AND OPTIMZE
BUSINESS PROCESSES
HIGH TOUCH TACTICAL EXECUTION
LOW TOUCH SELF SERVICE
DATE DRIVEN PROCESSES
BUSINESS VALUE DRIVEN PROCESSES
INEFFICENCIES & LOST PRODUCTITY
INCREASED PRODUCTIVITY
SILOED DATA / LIMITED VIEWS
SINGLE VIEW OF DATA/DATA SCORING
VARIABLE DATA QUALITY
DATA QUALITY & CONFIDENCE
![Page 12: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/12.jpg)
12
The Need for “CALM”Customer Asset Lifecycle Management
Forenterprise salesWho needaccurate and timely customer informationCALM is areal-time applicationProvidingup to the moment customer 360 dashboards
For enterprise salesWho need accurate and timely customer information
CALM is a real-time applicationProviding up to the moment customer 360o dashboards
Install Base
Pricing
Device Config
Contacts
Contracts
Analytics Contracts
Component Data
Offers
Scorecard
![Page 13: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/13.jpg)
13
Data Lake Architecture
D A T A P L A T F O R M
V M W A R E V C L O U D S U I T E
E X E C U T I O N
P R O C E S S GREENPLUM DBSPRING XD PIVOTAL HD
Gemfire
H A D O O P
ING
ES
TIO
ND
AT
A G
OV
ER
NA
NC
E
Cassandra PostgreSQL MemSQL
HDFS ON ISILONHADOOP ON SCALEIO
VCE VBLOCK/VxRACK | XTREMIO | DATA DOMAIN
A N A L Y T I C S T O O L B O X
Network WebSensor SupplierSocial Media MarketS T R U C T U R E DU N S T R U C T U R E D
CRM PLMERP
APPLICATIONS
Apache R
angerA
ttivioC
ollibraR
eal-T
ime
Mic
ro-B
atch
Bat
ch
![Page 14: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/14.jpg)
14
Data Ingestion• Small to Big Data (high-throughput)• Structured and unstructured Data from any Source• Streams and Batches• Secure, multi-tenant, configurable Framework
Real-Time Analytics• Tap into streams for in-memory Analytics• Real Time Data insights and decisions
Services• Data Ingestion to Data Lake• Data Lake APIs• Data Alerting
Business Data Lake Offerings
Unstructured
Structured
![Page 15: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/15.jpg)
15
Adding Real Time and Streamingto the Data Lake
![Page 16: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/16.jpg)
16
Seeking A Fast Database
A compliment to the business data lake
O P C M
![Page 17: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/17.jpg)
HammerDB Platform BenchmarksHammerDB workloads testing was done following EMC’s Oracle and SQL Server DBA Teams standard practices. Definition of workload. Mix of 5 transactions as follows:
• New order: receive a new order from a customer: 45%
• Payment: update the customer balance to record a payment: 43%
• Delivery: deliver orders asynchronously: 4%
• Order status: retrieve the status of customer’s most recent order: 4%
• Stock level: return the status of the warehouse’s inventory: 4%
Testing scenario:• 100 warehouses 8 vUsers. Database creation and initial data loading.
• Timed testing. 20 minutes per each testing session.
• Scaled number of virtual users for each testing session from 1 until 44.
No changes done to the systems and databases configuration while running the test.
![Page 18: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/18.jpg)
HammerDB Workload Testing
Each test was 16 vCPU x 32 GB RAM
• RedHat 6.4• Oracle 11g R2
• Windows Core 2012 R2 • SQL Server 2012 Ent Ed.
• RedHat 6.4• PostgreSQL 9.3.3
![Page 19: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/19.jpg)
HammerDB Workload - Results
Results
![Page 20: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/20.jpg)
Query PostgreSQL MemSQL Opportunity(5K) 5 seconds 200ms
Sales Order(170K) 1-1.5 Minutes 6 seconds
Territory(60K) 60 seconds 5 seconds
PostgreSQL vs In-Memory DB
We picked 5 top queries run by different business functions.Presented here are 3 queries that had response times that did not meet the SLA.
![Page 21: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/21.jpg)
21
Business Data Lake – Ingestion to Fulfillment
Raw Data
SummaryData
DAT
A G
OV
ER
NO
R Consumers
Predictive/PrescriptiveAnalytics
ProcessedData Analytical Data
GREENPLUM DATABASE
HADOOPRAWData
INGESTMANAGER
SPRING XD
SPARK
SQOOP
Execution TierCASSANDRAGEMFIRE
MEMSQL POSTGRESQL
Real-TimeTap
![Page 22: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/22.jpg)
22
Here Are The Data Flows We Built
Low Velocity
Batch
Real-Time
![Page 23: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/23.jpg)
23
Data Flow Patterns – Low Velocity
Analytical [BATCH]
Ingestion
Data
Service
JDB
C
Application
Presentation [SPEED/SERVING]
GREENPLUMDATABASE
PIVOTAL HD
POSTGRESQL
MEMSQL
RawData
One-Time
CASSANDRA
GEMFIRE
![Page 24: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/24.jpg)
24
Analytical [BATCH]
Ingestion
Data
Service
JDB
C
ApplicationGREENPLUMDATABASE
PIVOTAL HD
Data Flow Patterns – Batch
Batch
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
![Page 25: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/25.jpg)
25
Data Flow Patterns – Real Time
Real-time
Initial Load
Analytical [BATCH]
Ingestion
Data
Service
JDB
C
ApplicationGREENPLUMDATABASE
PIVOTAL HD
Presentation [SPEED/SERVING]
POSTGRESQL
MEMSQL CASSANDRA
GEMFIRE
![Page 26: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/26.jpg)
26
Nothing Closer To Real Time Than Streaming Let’s look at the leading edge Apache Kafka Messaging Semantics
• At most once• At least once• Exactly once
![Page 27: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/27.jpg)
27
At most once
000
?01 02 03 04
![Page 28: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/28.jpg)
28
At least once
01 02 03 04
000
?
![Page 29: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/29.jpg)
29
Exactly Once
000
01 02 03 04
01
![Page 30: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/30.jpg)
30
Understanding Streaming Semantics
At most once At least once Exactly once
Message pulled once Message pulled one or more times;processed each time
Message pulled one or more times;processed once
May or may not be received Receipt guaranteed Receipt guaranteed
No duplicates Likely duplicates No duplicates
Possible missing data No missing data No missing data
000? 000000 ?01
01
01
![Page 31: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/31.jpg)
31
Rendering In Real Time Picking the right business intelligence layer
• Tableau• Custom Application (CF, D3, Docker)• Additional Third Party Solutions
![Page 32: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/32.jpg)
32
Results, Plans, Lessons Learned
![Page 33: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/33.jpg)
33
Business Benefits
DATA QUERYINGDown from 4 hours per quarter to less than 1 minute per year
SIMPLIFIED PROVISIONING
Reduced number of tables/report required
DATA GOVERNANCE
Provides one version of the truth
TIME TO MARKETReduced number of tables/report
required
TOOL AGNOSTIC
Business logic in the DB not the tool provides increased
flexibility
![Page 34: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/34.jpg)
34
Use Case: Customer Account Profile STREAMLINED analytics ENVIRONMENT TO GAIN A HOLISTIC CUSTOMER VIEW
Service Request
Contracts
Installed Base
Bookings
Billings
EMC DATA LAKE
BDL SERVICES
DATA WORKSPACES
DATA INGESTION
Prof Services
23 BUSINESS MANAGED WORKSPACES
![Page 35: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/35.jpg)
35
Customer Asset Lifecycle ManagementPlatform Roadmap
Phase 1 : Foundational Capabilities/Discovery
Phase 2 : Scale Platform / Automate
Future Phases : Global Standard tool Integrations , advanced Analytics
BAaaS/Tableau
ScalablePlatform
Integrated Platform
GBSRenewals
InsideSales
Additional Business groups
Oct 2015 2016 TBDAug 2015
BDL Platform
Enablement CollaborationAcceleration
In-Memory Capabilities(POC)
We are here
![Page 36: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/36.jpg)
36
Data Services Roadmap
SecurityPlanned integration into custom BDL security API for managing Role Based Access Control (RBAC) to the underlying data
Business Data Lake Plans
![Page 37: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/37.jpg)
37
Lessons Learned – Key Takeaways
EDUCATE ASSESS INFRASTRUCTURE JOURNEY
Educate the business
Use examples of business impact
Assess in-house big data skills
Ensure plan to support the organization for 3-5 years
Choose the best possible infrastructure
Make sure your Big Data technology platform can evolve
Remember it is a journey
Look for small wins as well as big wins.
![Page 38: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/38.jpg)
38
Lessons Learned: Analytics and DataSourcing the right skills, working with a different philosophy,and some new tools will help you meet your analytical goals
TRANSFORM YOUR PEOPLE
CHANGE YOUR PROCESSES
ADAPT YOUR TECHNOLOGY
Data science in the organization, IT or both?
Helping business units take initiative
New philosophy to running analytics projects
How and when to share data
Steadily refine toolsets based on needed analysis
Identify to infrastructure layers
![Page 39: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/39.jpg)
39
Demonstration
![Page 40: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/40.jpg)
40
Demo Agenda
Showcase exactly-once semantics from Kafka
1: Data set of 200,000 transactions summing to zero
2: CREATE TABE AND CREATE PIPELINE
3: Push to Kafka and confirm exactly-once
4: Validate Resiliency and confirm exactly-once
![Page 41: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/41.jpg)
Step 1: Data Source start with a data set of 200,000 transactions representing
money/goods that sum to zero
![Page 42: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/42.jpg)
200,000 transactions• Transaction number• Increase / Decrease• Amount
![Page 43: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/43.jpg)
Step 2: CREATE TABLE AND CREATE PIPELINE
create a table and pipeline in MemSQL that subscribes to that Kafka topic
![Page 44: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/44.jpg)
CREATE TABLE
CREATE PIPELINE
![Page 45: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/45.jpg)
Step 3: Push to Kafka Push that data set to Kafka Validate exactly-once delivery by querying MemSQL
• show tables;• show pipelines;• select sum(amount) from transactions;
Should be 0 in the demo• select count(*) from transactions;
Should be 200,000 in the demo
![Page 46: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/46.jpg)
46
![Page 47: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/47.jpg)
Step 4: Resiliency induce a failures to show resiliency during exactly-once
workflowsa. randomly_fail_batches.pyb. restart Kafka and show error countc. continue and validate exactly-once semantics
![Page 48: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/48.jpg)
48
![Page 49: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/49.jpg)
Errors
TotalTransactions
Sum
![Page 50: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/50.jpg)
The mission is clear:
We’re movingfrom batch to real-time
with streaming
![Page 51: Getting It Right Exactly Once: Principles for Streaming Architectures](https://reader035.vdocument.in/reader035/viewer/2022062412/587c0f551a28ab03768b6425/html5/thumbnails/51.jpg)
Thank You
Darryl SmithChief Data Platform Architect and Distinguished Engineer
Dell Technologies