big data simplified"is all about abˈstraksh(ə)n"
TRANSCRIPT
![Page 1: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/1.jpg)
Big Data Simplified "Is all about abˈstrakSH(ə)n"
HEMAL GANDHI D IRECTOR OF DATA ENGINEER ING
![Page 2: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/2.jpg)
Background
![Page 3: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/3.jpg)
Analyze Current State
• Challenges
• Facts
New Platform Design
• Define Goals
• Feature List
• Implementation Approach
Compare
• Feature List
• Trade Offs
• Cost Structure
Decision
Fix vs.
Build?
![Page 4: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/4.jpg)
Analyze Current State
![Page 5: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/5.jpg)
Platform is very complex
![Page 6: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/6.jpg)
Struggling to keep up with business needs
![Page 7: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/7.jpg)
Huge backlog
![Page 8: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/8.jpg)
Code base is increasing rapidly
![Page 9: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/9.jpg)
We are slow to respond to market needs
![Page 10: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/10.jpg)
Outdated technology stack
![Page 11: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/11.jpg)
Missing best practices
![Page 12: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/12.jpg)
High cost of data Storage
Finding Insights Integration Maintenance
![Page 13: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/13.jpg)
Strategic Value
Data Identity
Time Value
Dependencies
Lack of understanding business impact of data
![Page 14: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/14.jpg)
Agile – mini waterfall
![Page 15: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/15.jpg)
Process and Organization
High Investments Costs
Adoption Issues
Complex Framework
![Page 16: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/16.jpg)
Lot of Challenges
![Page 17: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/17.jpg)
NOT scalable platform
Can impact revenue negatively!!!
![Page 18: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/18.jpg)
New Platform Design
![Page 19: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/19.jpg)
Keep it simple
![Page 20: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/20.jpg)
Keep up with business needs
![Page 21: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/21.jpg)
Move fast
![Page 22: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/22.jpg)
Keep technology stack current over time
![Page 23: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/23.jpg)
Low cost of data Storage
Finding Insights Integration Maintenance
![Page 24: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/24.jpg)
Strategic Value
Data Identity
Time Value
Dependencies
Understand business impact of data
![Page 25: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/25.jpg)
Measure data
![Page 26: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/26.jpg)
Be Agile – Do Less
![Page 27: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/27.jpg)
Improve data ROI
![Page 28: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/28.jpg)
Compare
![Page 29: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/29.jpg)
Investment needs
Current Platform
High
New Platform Vs.
High
![Page 30: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/30.jpg)
Scalability
Current Platform
Not Scalable
New Platform Vs.
Initially Scalable
![Page 31: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/31.jpg)
Maintenance cost
Current Platform
High
New Platform Vs.
Initially low, grows over time
![Page 32: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/32.jpg)
Technology
Current Platform
Outdated
New Platform Vs.
Big Data tools provide technology
not solutions to design problems
![Page 33: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/33.jpg)
Technology choices
![Page 34: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/34.jpg)
Decision Fix vs.
Build?
![Page 35: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/35.jpg)
![Page 36: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/36.jpg)
Next Steps
![Page 37: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/37.jpg)
Build a feature based scalable big data
platform in 6 months with limited resources
while supporting legacy system.
Goal
![Page 38: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/38.jpg)
Design Patterns
![Page 39: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/39.jpg)
Take Platform Approach
Project Requirements
Data Platform Features
Reusable Components
![Page 40: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/40.jpg)
Technology Abstraction
Business Logic Declarative
Configuration
Pick Technology at Runtime
Execution Engine
![Page 41: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/41.jpg)
Data Access & Ingestion Abstraction
Data Storage
Data Access API Data Ingestion Framework
Data Producers Data Consumers
![Page 42: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/42.jpg)
Data Integration Jobs
Stream Data to Storage Layer
Data Storage
Data Integration Jobs Stream
![Page 43: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/43.jpg)
Hot Data
Hot/Cold Data Management
Cold Data Configuration
Configuration
![Page 44: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/44.jpg)
abˈstrakSH(ə)n
![Page 45: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/45.jpg)
High Level Architecture
![Page 46: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/46.jpg)
Data Quality Service (Data Lineage & Profiling)
Security Scheduling & Cluster Monitoring
Applications & Visualization Tools
Dredge
Collection • Apache Flume • Sqoop
Flow • Kafka • Spark
Processing • PIG • Spark • Map Reduce
Storage • Hive • HBase • Vertica
Delivery • Looker • Tableau • Visualization (d3.js) • Email/FTP
Data Platform
Data Access Abstraction
Architecture
![Page 47: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/47.jpg)
A declarative, abstraction layer for integrating big
data tools, enabling loosely coupled big data platform.
WHAT IS DREDGE
![Page 48: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/48.jpg)
Dredge Logical View
Events Management Log Streaming
Tasks Hadoop Cluster
Source Readers
Target Writer Streams/Direct
Dredge Repository – HBase
Target End
Points
Source End
Points
Configuration Abstraction
![Page 49: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/49.jpg)
Dredge Repository – HBase
LAMDA Architecture : HDFS, Hive, HBase, PIG, Flume, Kafka, Oozie
Dredge Runtime Temp Store - HDFS Event
Management Temp Cache- HDFS Logger Stream
Dredge Data Services
Aggregator
UDF’s
Combiners, Routers..
Plugin (Java/Shell, PIG, SQL)
Rank, Sorter Set Operations
Filters/Patterns Analysis
Abstraction builder (Kafka, Flume, Pig, Custom)
Source Readers (Logs, RDBMS, unstructured data, Custom)
Direct/Stream
Target Writers (Hive, HBase, RDBMS, Custom) Direct/Stream
Dredge UI
Declarative configuration
Logical Flows
Data Lineage
Runtime Logs
Admin
Dredge Architecture
![Page 50: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/50.jpg)
• From 1000+ scripts to 50-100 scripts
• From 1000+ configuration files to <5 files
• Logical view of workflow, abstract physical implementation
• Quickly integrate new tools, declarative configuration
implementation for big data tools
• Improved SLA, time to market, better cluster utilization,
higher performance
• Simplified integration
• Minimal migration costs
• Low maintenance, configurable archiving of data
DREDGE BENEFITS
![Page 51: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/51.jpg)
Summarizing
![Page 52: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/52.jpg)
ü Abstraction layer
ü Technology
ü Data access
ü Data ingestion
ü Dependencies… It is all about abˈstrakSH(ə)n
ü Reusable data components
ü Event driven dependencies
ü Plug & Play integration, loosely coupled (Cluster resources, Data)
![Page 53: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/53.jpg)
Big data requires a different mindset:
Innovate, iterate often and keep it simple.
![Page 54: Big Data Simplified"Is all about abˈstrakSH(ə)n"](https://reader034.vdocument.in/reader034/viewer/2022042716/55c5ee8fbb61ebd9158b4732/html5/thumbnails/54.jpg)
Thank you.
E N G I N E E R I N G . O N E K I N G S L A N E . C O M