how we do routing in schibsted routing billions of events ... · gdpr and data collection 9 legal...

Post on 25-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Routing billions of events a day:How we do routing in Schibsted

1

Carlos Manuel Duclos-Vergara, Staff Engineer

About me

2

Agenda• Schibsted• A short story• GDPR• Pulse (our tracking solution)

• Overview• Internals

3

Schibsted

4

Event generation

5

Event routing

6

Event dispatching

7

Event consumption

8

GDPR and data collection

9

Legal basis for data collection

1. Consent2. Processing obligation3. Legal obligation4. Vital interest5. Public interest6. Legitimate interest

User rights

1. Data portability2. Right to be forgotten

End to end event processing solution

10

Pulse ecosystem

11

Lifetime of an event

12

Side track: How much is 1 billion events

13

Common pipeline

14

Batch pipeline

15

Streaming pipeline

16

Processing and routing internals

17

Routing lib

18

Processing: routing languageSinkName: eventType: event schema filter: inline || stored || null transform: stored || null SinkType: SinkDetails:

19

ProbeEvent-1: eventType: ProbeEvent kafka: topic: probe-topic

Event formats: probe event{

"$schema": "http://json-schema.org/draft-04/schema#",

"allOf": [

{

"$ref": "base-routable-event.json#"

}

],

"description": "Events sent by Data Platform Probe to measure latencies and missing events in the pipeline",

"id": "http://schema.schibsted.com/events/backend-probe-event.json#",

"properties": {

"senderId": {

"description": "Sender ID, in case several instances of Probe is running",

"type": "integer"

},

"sequenceNumber": {

"description": "Probe sequence number",

"type": "integer"

},

"timeSent": {

"$ref": "../common-definitions.json#/definitions/timestamp",

"description": "UTC timestamp of when the event is generated by Probe"

}

},

"title": "BackendProbeEvevnt",

"type": "object"

}

20

JSLT: The magic sauce of processingJSON query and transformation language

21

Github repo: https://github.com/schibsted/jslt

License: Apache 2.0

{

"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),

"device_manufacturer": .device.manufacturer,

"device_model": .device.model,

"language": .device.acceptLanguage,

"os_name": .device.osType,

"os_version": .device.osVersion,

"platform": .device.platformType,

"user_properties": {

"is_logged_in" : boolean(.actor."spt:userId")

}

}

Routing: batch

22

Routing: streaming

23

Lessons learned (so far…)• Schemas and versions• Backfilling and recovery• Logging and metrics• Auditing

24

And finally

25

Extra

27

About Schibsted

28

Marketplaces

29

News Media

30

Some of our Next companies

31

top related