agile data processing pipelines - software engineering radio · agile data processing pipelines ken...
TRANSCRIPT
![Page 1: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/1.jpg)
AGILE DATA PROCESSING PIPELINESKen Collier, PhD Director, Agile Analytics @theagilist #thoughtworks
![Page 2: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/2.jpg)
Conventional Architectures
Pull-based Batch Loads
Enterprise Data Models
Complex ETL Logic
Poorly Suited to
Non-Relational Data
Emergent design is difficult
![Page 3: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/3.jpg)
DESIGN PRINCIPLES
Enable cheap/easy data ingestion
Enable inexpensive scaling
Enable emergent design
Enable easy recreation of information
Drive logic closer to the application
Enable near real time presentation
Support polyglot persistence
![Page 4: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/4.jpg)
DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEYS DATA LINEAGE
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
![Page 5: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/5.jpg)
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED
![Page 6: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/6.jpg)
INFORMATION PUBLISHING TOPICAL QUEUES MDM CONCERNS DATA GOVERNANCE POST PROCESSING
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
![Page 7: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/7.jpg)
INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION POST PROCESSING
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
![Page 8: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/8.jpg)
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?
![Page 9: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/9.jpg)
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
![Page 10: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/10.jpg)
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
Transformation Logic
Data Post Processing
Near Real Time Feed
Emergent Design &
Agile Delivery
![Page 11: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/11.jpg)
![Page 12: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/12.jpg)
Apache KafkaApache Storm
![Page 13: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional](https://reader035.vdocument.in/reader035/viewer/2022062917/5ed3c5d385d90c53341ac891/html5/thumbnails/13.jpg)
For questions or suggestions: !
Ken Collier [email protected]
Follow @theagilist @thoughtworks
THANK YOU