![Page 1: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/1.jpg)
Real time streaming in PostgreSQL Kaushik Iyer
![Page 2: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/2.jpg)
Agenda- Initial approach- Streaming techniques- Updated pipeline- Observations
![Page 3: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/3.jpg)
Effective replication of data
All the necessary data is in postgresql
JDBC Connector to effectively pull the data
Perform ETL and push to ElasticSearch
![Page 4: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/4.jpg)
Drawbacks
- Snapshot mechanism involved a lot of trial and error.
- A pertinent lag of 5 minutes was present.- Inconsistency in user experience.
- Non computable during bulk loads.
- We were storing relational data as such in a Document store.
![Page 5: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/5.jpg)
Event Sourcing
![Page 6: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/6.jpg)
Components of a Event Sourcing pipeline
![Page 7: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/7.jpg)
![Page 8: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/8.jpg)
Change Data Capture
![Page 9: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/9.jpg)
Postgres
![Page 10: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/10.jpg)
Components of a CDC pipeline
Database Capture change Buffer
Log store
![Page 11: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/11.jpg)
CDC vs Event Sourcing
- Change Data Capture is a form
of derived event sourcing.
- We cannot add event
generators throughout the
platform.
- Change Data Capture is more
flexible.
![Page 12: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/12.jpg)
Available CDC software
- LinkedIn DataBus
- Stitch data
- Qlik data
- Oracle GoldenGate
- Netflix Delta
![Page 13: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/13.jpg)
Debezium
![Page 14: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/14.jpg)
Key features
- Detailed message structure, with a plethora of metadata.
- Requires no change to the schema of tables.
- Robust snapshot mechanisms.
- Built in filters and masking options.
- Monitoring through JMX.
- Embedded variant also available.
![Page 15: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/15.jpg)
Debezium sample Event
![Page 16: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/16.jpg)
The CDC Pipeline
![Page 17: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/17.jpg)
PostgreSQL
- Logical Replication
- Publisher
- Subscriber
- Log Sequence numbers
- Replication Slots
- Logical Decoding
- Wal2json
- Pgoutput
Publication
SLOT
Decoder
Sender
![Page 18: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/18.jpg)
Transaction Log
- We are using Apache Kafka as our transaction log.
- Kafka Connect runs Debezium.
- Kafka REST to manage the lifecycle of connector.
![Page 19: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/19.jpg)
Sample Debezium configuration
![Page 20: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/20.jpg)
PostgreSQL
Kafka
Kafka RESTKafka ConnectDebezium
Elasticsearch
![Page 21: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/21.jpg)
Results and observations
![Page 22: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/22.jpg)
Performance in bulk loads
![Page 23: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/23.jpg)
Performance of logical decoders
![Page 24: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/24.jpg)
Dependency on number of users
![Page 25: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/25.jpg)
Improvement
- The lag is now in the order of ms- Effective decoupling.- High fault tolerance.- Setup details:
- M4.xlarge container - ~ $0.2/hour
![Page 26: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd686699ab5f0b095128bc/html5/thumbnails/26.jpg)
Thank youEmail: [email protected]: KaushikIyer16Twitter: @kaushiiyerLinkedin: www.linkedin.com/in/kaushiiyer