big data blueprint

Post on 12-Jul-2015

427 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data BluePrintArchitect for change

@daangerits#bdbp

Who am I?

@daangeritsdaan@bigboards.io

Agenda

ConceptsArchitecture

Examples

Concepts

TransCo

Meet TransCo - Parcel delivery service

Common interactions

A customer requesting a quote

A website visitor clicking on a link

Booking a financial transaction

A delivery truck pinging its GPS coördinates

TransCo

All these have a similar thing:

Events

ITFinanceLegalLogisticsSalesCommunications...

Events

Events used to manipulate our master data

Events

Today, events ARE our master data

Anatomy of an event

Timestamp

When did it happen?

Origin

Where did it came from?

Actor

Who did it?

Subject

Who was affected?

Facts

What changed?

Event

Anatomy of an event - example

2014-05-0313:40:51

timestamp

CRM Application

origin

Daan Gerits

actor

Alfred Hitchcock

subject

street=”...”vat=”...”

facts

Event

Architecture

Store

View Generator

View Generator

Overview

Translate entities into events and

facts.

Resolve values to ids. Especially

subject, actor and origin.

Explode a single fact to multiple

rollup levels. Only explode if applicable.

Store the raw events so we can replay whenever

we want.

DetonatorLinkerTranslator

Ingest View generators can perform analytical tasks on the incoming events.

The generated view can be stored in a storage system of choice.

S

I

T L D

V

V

Ingest

S

I

T L D

V

V

Get records in from other systems

- Event Bus/Broker

- Ingestion System like Flume / Sqoop / …

- ETL processes (not recommended)

- Backups

- Nagios / Statsd / Ganglia / ...

Translator

Convert records into events- 1 record field = 1 fact- record timestamp vs generated timestamp

Only store changed facts- What changed?- Compare with existing views

S

I

T L D

V

V

Store

Persist the events as they are

Raw Data- Source of truth- Recovery

Optimize Storage- Parquet, Avro, Thrift, ...

S

I

T L D

V

V

Linker

Resolve event fields- “Daan Gerits” == id 44543-45436-9928

Optimize for speed- Use lookup tables- Group data if needed

S

I

T L D

V

V

Detonator

Explode a fact to multiple rollup levels

Why?- Real-time rollups- Running analytics

When?- if there is an hierarchy in actor or actee- if there is an hierarchy in timestamp

S

I

T L D

V

V

IN OUT

{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}{ts: 2014-05, fact: …}

{ts: 2014, fact: …}

View Generator

Use facts to generate a view

A view is- != database view- read-only- optimised data model for a single purpose- disposable- based on all facts (facts depth & width)

A view generator manipulates- RDBMs, graphs, search indexes, ...

S

I

T L D

V

V

Rules of the game

Only add and remove are allowed

Events are re-playable

Remove only be done by BDA’s (Big Data Administrators)

Example

Add Customer

IN:processing system: CRM

user: “fbaker”

data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

Update Customer

IN:processing system: ERP

user: “wvl”

data: { id: “9332-DG”, address: “container 24” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

DELETE Customer

IN:processing system: ERP

user: “fbaker”

data: { id: “9332-DG” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

63 erp fbaker 9332-DG 20141201 address

63 erp fbaker 9332-DG 20141201 name

Aaaarrgghhh!!

IN:processing system: ERP

user: “fbaker”

data: { id: “9332-DG” }

event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

63 erp fbaker 9332-DG 20141201 address

63 erp fbaker 9332-DG 20141201 name

64 erp wvl 9332-DG 20141109 address container 24

64 crm fbaker 9332-DG 20140514 name Daan Gerits

Allows fact trendingdriver statistics for his whole career

Allows state regenerationthe state of all facts on februari 12, 2005

Is human-error-proofremove the facts with eventId #

Scales very well

Conclusion

We don’t hire datascientists, architects, developers, ux designers

or engineers.We hire individuals

Sh

am

ele

ss P

lug

Th

an

k Yo

u!

top related