how we do routing in schibsted routing billions of events ... · gdpr and data collection 9 legal...

31
Routing billions of events a day: How we do routing in Schibsted Carlos Manuel Duclos-Vergara, Staff Engineer

Upload: others

Post on 25-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing billions of events a day:How we do routing in Schibsted

1

Carlos Manuel Duclos-Vergara, Staff Engineer

Page 2: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

About me

2

Page 3: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Agenda• Schibsted• A short story• GDPR• Pulse (our tracking solution)

• Overview• Internals

3

Page 4: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Schibsted

4

Page 5: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event generation

5

Page 6: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event routing

6

Page 7: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event dispatching

7

Page 8: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event consumption

8

Page 9: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

GDPR and data collection

9

Legal basis for data collection

1. Consent2. Processing obligation3. Legal obligation4. Vital interest5. Public interest6. Legitimate interest

User rights

1. Data portability2. Right to be forgotten

Page 10: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

End to end event processing solution

10

Page 11: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Pulse ecosystem

11

Page 12: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Lifetime of an event

12

Page 13: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Side track: How much is 1 billion events

13

Page 14: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Common pipeline

14

Page 15: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Batch pipeline

15

Page 16: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Streaming pipeline

16

Page 17: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Processing and routing internals

17

Page 18: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing lib

18

Page 19: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Processing: routing languageSinkName: eventType: event schema filter: inline || stored || null transform: stored || null SinkType: SinkDetails:

19

ProbeEvent-1: eventType: ProbeEvent kafka: topic: probe-topic

Page 20: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event formats: probe event{

"$schema": "http://json-schema.org/draft-04/schema#",

"allOf": [

{

"$ref": "base-routable-event.json#"

}

],

"description": "Events sent by Data Platform Probe to measure latencies and missing events in the pipeline",

"id": "http://schema.schibsted.com/events/backend-probe-event.json#",

"properties": {

"senderId": {

"description": "Sender ID, in case several instances of Probe is running",

"type": "integer"

},

"sequenceNumber": {

"description": "Probe sequence number",

"type": "integer"

},

"timeSent": {

"$ref": "../common-definitions.json#/definitions/timestamp",

"description": "UTC timestamp of when the event is generated by Probe"

}

},

"title": "BackendProbeEvevnt",

"type": "object"

}

20

Page 21: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

JSLT: The magic sauce of processingJSON query and transformation language

21

Github repo: https://github.com/schibsted/jslt

License: Apache 2.0

{

"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),

"device_manufacturer": .device.manufacturer,

"device_model": .device.model,

"language": .device.acceptLanguage,

"os_name": .device.osType,

"os_version": .device.osVersion,

"platform": .device.platformType,

"user_properties": {

"is_logged_in" : boolean(.actor."spt:userId")

}

}

Page 22: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing: batch

22

Page 23: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing: streaming

23

Page 24: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Lessons learned (so far…)• Schemas and versions• Backfilling and recovery• Logging and metrics• Auditing

24

Page 25: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

And finally

25

Page 26: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation
Page 27: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Extra

27

Page 28: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

About Schibsted

28

Page 29: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Marketplaces

29

Page 30: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

News Media

30

Page 31: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Some of our Next companies

31