real%&me(analy&cs(and(data(inges&on(on(tbs( of(data(using...

46
Real%&me Analy&cs and Data Inges&on on TBs of Data using PostgreSQL Utku Azman Director – R&D

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Real%&me(Analy&cs(and(Data(Inges&on(on(TBs(of(Data(using(PostgreSQL(

Utku(Azman(Director(–(R&D(

Page 2: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

•  Processing(vast(amounts(data(for(Insights(•  Providing(human(real%&me(interac&on(•  Keeping(up(with(high(velocity(data(•  Managing(complexity(&(cost(

The(Problem(

1(

Page 3: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

•  Processing(vast(amounts(data(for(Insights(•  Providing(human(real%&me(interac&on(•  Keeping(up(with(high(velocity(data(•  Managing(complexity(&(cost(

The(Problem(

2(

Page 4: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

•  Processing(vast(amounts(data(for(Insights(•  Providing(human(real%&me(interac&on(•  Keeping(up(with(high(velocity(data(•  Managing(complexity(&(cost(

The(Problem(

3(

Every(60(seconds:(

Page 5: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

The(Problem(

•  Processing(vast(amounts(data(for(Insights(•  Providing(human(real%&me(interac&on(•  Keeping(up(with(high(velocity(data(•  Managing(complexity(&(cost(

4(

Page 6: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on?(

5(

Page 7: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on?(

6(

Fast(Analy&cs(

Page 8: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on?(

7(

Fast(Analy&cs( Scalability(/(High(Availability(

Page 9: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on?(

8(

Fast(Analy&cs(

Real%&me(((Data(

Scalability(/(High(Availability(

Page 10: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on(%(Approach(1(

9(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability(

Integra&ng(Mul&ple(Database(Technologies(

Page 11: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

(Real-time) (Offline) Analytics Operations

Data

DWH

DWH-on-Hadoop

Pre-aggregates

Production SQL

Production NoSQL

Complex(&(Expensive(

10(

Page 12: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on(–Approach(2(

11(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability(

Unified(Analy&cs/Opera&ons(Database(that(scales(

Page 13: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Solu&on(–Approach(2(

12(

Unified(Analy&cs/Opera&ons(Database(that(scales((

+(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability(

AND(comes(with(a(community(driven(and(open(ecosystem(

Page 14: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Why(PostgreSQL?(

13(

“By(2018,(more(than(70%(of(new(in%house(applica&ons(will(be(developed(on(an(Open(Source(DBMS”((Gartner(

Source:(Gartner:(State(of(Open(Source(RDBMS%(2015,(Hacker(News(

0%

10%

20%

30%

40%

50%

PostgreSQL MySQL MongoDB SQL Server Oracle Cassandra

2014 2010

PostgreSQL(Rising(–(Which(DB(do(you(use?((Hacker(News)(

Page 15: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Who(are(we?(

14(

•  Citus(Data(based(in(San(Francisco(since(2011(

•  Built(CitusDB(–(Scalable(PostgreSQL((•  Open(sourced(columnar(storage(and(sharding(

extensions(for(PostgreSQL(

Page 16: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Result:(Real%&me(Big(Data(on(PostgreSQL(

15(

Analyze(billions(of(events(

Apply(hundreds(of(filters(on(the(fly(

Get(responses(in(<(seconds(

Serve(millions(of(end%users(

Update(millions(of(records(in(minutes(

Page 17: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Result:(Real%&me(Big(Data(on(PostgreSQL(

With(the(simplicity(of(maintaining(ONE(database(

16(

Page 18: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Our(Approach(

17(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability( +(

Page 19: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Extending(PostgreSQL(

18(

U"lize'Hooks'

Use'Foreign'Data'Wrappers'

Sync'with'every'major'release'

•  Always(benefit(from(latest(advancements(•  Support(all(datatypes,(extensions,(tools(•  Leverage(community(and(ecosystem(

How?(

PostgreSQL(Internals(

U&lize(Hooks(

Data(ty

pes(

Use(Foreign(Data(Wrappers(

Commun

ity,(features(

Sync(with(every(major(release(

Page 20: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

19(

.(.(.(

CitusDB (Scalable PostgreSQL)

Data Storage and Retrieval

Real-time Analytics (e.g. Tableau, custom)

Flexibility and familiarity of PostgreSQL: -Data types -Storage formats -Extensions -Connectors, tools, documentation, more

SQL (ODBC / JDBC)

Data Sources App

server -  Clickstream -  Events, transactions

App server

App server

-  Machine generated data -  Other (traditional) data sources

Familiar,(Extensible,(Rich(

PG tools, connectors

Page 21: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Our(Approach(

20(

Fast'Analy"cs'

Real%&me(Data(

Scalability(/(High(Availability( +(

Page 22: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

1.(Massive(Paralleliza&on(Analy&cs(

•  Massively(Parallelized(Queries(•  Mul&%threaded(processing(•  Push(compute(to(data(

Events'

CitusDB(worker(1(

…'

…' …' …'

…' …' …'

CitusDB(master(

PostgreSQL(Query(%>(Events(

E1( E3’(

CitusDB(worker(2(

…'

…' …' …'

…' …' …'

E2( E1’(

CitusDB(worker(N(

…'

…' …' …'

…' …' …'

E3( E2’(…(

PostgreSQL(Query(%>(E1(PostgreSQL(Query(%>(E2(

PostgreSQL(Query(%>(E3(

21(

Page 23: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

SELECT''avg(price),'max(price)'

FROM'''items'

WHERE''quantity'>'10'

Machine #1 Machine #2 Machine N

Master

Row Data

Pull Data

I/O Bottleneck

Heavy compute on master

Avoiding(pull(data(to(master(approach((Analy&cs(

22(

Page 24: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Instead(pushing(compute(to(data(SELECT'

'avg(price),'max(price)'FROM''

'items'WHERE'

'quantity'>'10'

Push Compute

SELECT''sum(price),'count(*),' 'max(price)'

FROM'''items'

WHERE''quantity'>'10'

Machine #1 Machine #2 Machine N

Master

sum'

count'

max'

sum'

count'

max'sum'

count'

max'

Σ sumi Σ counti

max({max1 ... maxN})

Analy&cs(

23(

Page 25: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

2.(Columnar(Analy&cs(•  Columnar(projec&ons((read(only(relevant(columns)(•  Skip(indexes((skip(over(irrelevant(rows)(•  PostgreSQL(integra&on((sta&s&cs,(na&ve(formats)(•  Compression((more(data(fits(in(memory)(

Input Type

Estimated Input Rate

Cost to query performance

Memory 10 GB/s 3.9 seconds

SSD 600 MB/s >60 seconds

With(row(storage((PostgreSQL)(•  Read(700(columns(instead(of(5(•  >39(GB(of(unnecessary(I/O(

Analy&cs(

24(

Page 26: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Compression(with(columnar(store(

Regular(Columnar(Columnar(w/(compression(

Table sizes normalized to 1.0

~4x(compression(

Analy&cs(

25(

Page 27: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Bopomline:(Fast(Analy&cs(CitusDB'–'Scalable'PostgreSQL'(Columnar)'Impala'2.0.0'

SparkSQL'1.1.0'

PostgreSQL(can(be(faster(than(

Impala,(SparkSQL(!(

Analy&cs(

26(

Page 28: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Our(Approach(

27(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability( +(

Page 29: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Scalability(/(High(Availability(Scalability(/(HA(

•  Replica&on(•  “Automagic”(failure(handling(•  Dynamic(rebalancing/scaling(

Master(Node(

1' 3' 4'

6' 7' 9'

…' …' …'

Worker(Node(#1(

1' 2' 4'

5' 7' 8'

…' …' …'

Worker(Node(#2(

2' 3' 5'

6' 8' 9'

…' …' …'

Worker(Node(#N(

shard(and(shard(placement(metadata(

Many(small(data(shards( …(

28(

Page 30: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

“Automagically”(Handle(Failures(

Node(#1((

SELECT''avg(price),'max(price)'

FROM'''items'

WHERE''quantity'>'10'

Fixed size block of data Data queried

Node(#2((

Node(#3((

Node(#4((

Scalability(/(HA(

29(

Page 31: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

“Automagically”(Handle(Failures(

Node(#1((

Fixed size block of data Data queried

Replicas for failing blocks

Node(#2((

Node(#3((

Node(#4((

SELECT''avg(price),'max(price)'

FROM'''items'

WHERE''quantity'>'10'

Scalability(/(HA(

30(

Page 32: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

“Automagically”(Handle(Failures(

Node(#1((

SELECT''avg(price),'max(price)'

FROM'''items'

WHERE''quantity'>'10'

Fixed size block of data Data queried

Replicas for failing blocks

Node(#2((

Node(#3((

Node(#4((

Scalability(/(HA(

31(

Page 33: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Dynamically(Scale(Out(

Node(#4(

Node(#1((

1' 3' 4'

6' 7' 9'

…' …' …'

…' …' …'

Node(#2(

1' 2' 4'

5' 7' 8'

…' …' …'

…' …' …'

Node(#3(

2' 3' 5'

6' 8' 9'

…' …' …'

…' …' …'

512'MB'(each)'

Scalability(/(HA(

32(

Page 34: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Mid%query(recovery(

from(failures(

Hundreds(of(nodes,(

thousands(of(CPU(cores(

Dynamic(rebalancing(and(scaling((

Petabytes(of(space(

Bopomline:(Scalability…that(works(

Hundreds(of(nodes(

Scalability(/(HA(

33(

Page 35: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Our(Approach(

34(

Fast(Analy&cs(

Real%&me(Data(

Scalability(/(High(Availability( +(

Page 36: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

1' 3' 4'

6' 7' 9'

…' …' …'

…' …' …'

Worker(Node(#1(

1' 2' 4'

5' 7' 8'

…' …' …'

…' …' …'

Worker(Node(#2(

2' 3' 5'

6' 8' 9'

…' …' …'

…' …' …'

Worker(Node(#3(

SinglePshard'INSERT'Replica"on'factor:'2'

Master(

INSERT'INTO'customer_reviews'...(

Real%&me(Data(Real%&me(INSERTS/UPDATES(

35(

Page 37: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Real%&me(Data(

Bopomline:(Unified(Analy&cs/Opera&ons(

Number of nodes

Real-time Operations Transactions per sec (TPS)

Scalable PostgreSQL Cluster Performance

0 10 20 1 10 Number of nodes

20 5

Real-time Analytics Query completion time (sec)

Simultaneously 36(

Page 38: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

37( 37(

Page 39: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Putng(it(all(together(

38(

We(start(with…(

•  PG performance up by 40% with each new release between 7.4 and 9.3

•  High performance JSONB introduced in 9.4 (Dec-2014)

•  Feature parity with Oracle

•  100’s of developers contributing •  Same day productivity for

developers, DBAs, analysts •  Tools, extensions, libraries, forums

Page 40: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

39(

Putng(it(all(together(…and(teach(PostgreSQL(new(tricks(

•  100x analytics performance with massive parallelization and columnar analytics •  Scalability & high availability with dynamic horizontal scaling to 100s of nodes •  Real-time insights on very large data with unified analytics/operations

Page 41: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Example(Applica&on(in(Produc&on(

•  Cloudflare((– CDN(with(>5%(global(internet(traffic(– >100(billion(network(events(processed(per(day(– Analy&cs(dashboard(serving(2,000,000+(end(users(– Real%&me(data(ingest(and(sub%second(queries(across(billions(of(rows(for(analy&cs(

40(

Page 42: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Scaling(PostgreSQL(with(CitusDB(at(Cloudflare(

41(hpps://www.citusdata.com/blog(

Trillions(of(events(

billions(of(1%minute(aggrega&ons(

Real%&me(data(inges&on(

25ms(–(2sec(query(&mes(

Page 43: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

42(

Demo:(Real%&me(Analy&cs(Dashboards(

Scaling(PostgreSQL(with(CitusDB(at(Cloudflare(

Page 44: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

What(worked(for(Cloudflare(

43(

• PostgreSQL(compa&bility(• Trusted(DB(• Extension(mechanisms,(mul&%structured(data(• Community,(documenta&on(

• Performance(• Paralleliza&on(across(millions(of(shards(• Fast(responses(to(both(customer(facing(&(BI(queries(

• PostgreSQL(Exten&ons((• Hstore:(Keep(sparse(data(efficiently((• HLL:(Fast(unique(count(approxima&ons(

• Dynamic(Scaling(• Grow(cluster(as(needed(

• High(Availability(• Real%&me(recovery(from(failures(

(

Page 45: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Summary:(CitusDB(Applica&ons(

44(

• Cloudflare(example:((• Real%&me(analy&cs(• Scalable(&(high(availability(PostgreSQL(

• Other(uses(in(produc&on:(• More(interac&ve(dashboards((E.g.(funnel(analy&cs)(• NoSQL(use(cases((JSONB,(low(latency(writes)(• Simplifica&on(of(complex,(mul&%&ered(DWH(+(Analy&cs(

Page 46: Real%&me(Analy&cs(and(Data(Inges&on(on(TBs( of(Data(using ...info.citusdata.com/rs/citusdata/images/Real_Time_Analytics_and_Da… · -Connectors, tools, documentation, more SQL (ODBC

Ques&ons(

45(

(• Email((

• [email protected](• [email protected]((

• Visit((• www.citusdata.com(• www.citusdata.com/blog(