clickhouse 2018 - percona · m.sc. in mathematics from moscow state university software engineer...
TRANSCRIPT
![Page 1: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/1.jpg)
ClickHouse 2018How to stop waiting for your queries
to complete and start having fun
Alexander Zaitsev
Altinity
![Page 2: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/2.jpg)
2
Who am I
M.Sc. In mathematics from Moscow State University
Software engineer since 1997
Developed distributed systems since 2002
Focused on high performance analytics since 2007
Director of Engineering in LifeStreet
Co-founder of Altinity – ClickHouse Service Provider
![Page 3: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/3.jpg)
3
.. and I am not Peter ’s brother :)
![Page 4: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/4.jpg)
4
What Is ClickHouse?
![Page 5: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/5.jpg)
5
© http://mattturck.com/
![Page 6: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/6.jpg)
6
ClickHouse DBMS is
Column StoreMPPRealtimeSQLOpen Source
![Page 7: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/7.jpg)
7
http://clickhouse.yandex
• Developed by Yandex for Yandex.Metrica- Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+
b2b and b2c products)
- Yandex.Metrica – world 2nd largest web analytics platform
• Open Source since June 2016 (Apache 2.0 license)
• 200+ companies using in production today
• Several hundred experimenting, doing POC etc.
• Dozens of contributors to the source code
![Page 8: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/8.jpg)
8
Why Yet Another DBMS?
![Page 9: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/9.jpg)
9
SQLFlexible
![Page 10: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/10.jpg)
11
OpenSourceAnalytical
DBMS
CommercialAnalyticalDBMS
![Page 11: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/11.jpg)
12
ClickHouse
Fast!
Flexible!
Free!
Fun!
![Page 12: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/12.jpg)
13
How Fast?
![Page 13: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/13.jpg)
14
:) select count(*) from dw.ad8_fact_event;
SELECT count(*)FROM dw.ad8_fact_event
┌───────count()─┐│ 1261705085657 │└───────────────┘
1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
![Page 14: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/14.jpg)
15
:) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2;
SELECT sum(price_cpm)FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2)
┌────sum(price_cpm)─┐│ 87579.09035192338 │└───────────────────┘
1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.)
Altinity Ltd. www.altinity.com
1+ trillion rows table
![Page 15: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/15.jpg)
16
WikiStat data, 28B rows.https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
![Page 16: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/16.jpg)
17
Query 1 Query 2 Query 3 Query 4 Setup0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs- 2.415 3.599 4.962 ClickHouse at Altinity demo server0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster2 2 1 3 BigQuery6.41 6.19 6.09 6.63 Amazon Athena8.1 18.18 n/a n/a Elasticsearch (heavily tuned)14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS152 175 235 368 PostgreSQL 9.5 & cstore_fdw
“1.1 Billion Taxi Rides Benchmarks”http://tech.marksblogg.com/benchmarks.html
![Page 17: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/17.jpg)
18• 19 queries, 1200M rows table, 3-node clusters
2016 LifeStreet benchmark (unpublished)
![Page 18: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/18.jpg)
19
Time Series benchmarks (first time today!)
https://github.com/timescale/tsbs
Benchmark suite to automate testing
Loads 103M rows, 10 metrics per row
Runs 15 queries, 1000 runs each in 8 parallel threads
Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted)
![Page 19: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/19.jpg)
20
0
100
200
300
400
500
600
700
800
900
ClickHouse TimescaleDB InfluxDB
Load time (s)
![Page 20: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/20.jpg)
21
0
10
20
30
40
50
60
70
80
ClickHouse
TimescaleDB
InfluxDB
“Light” queries, time in ms
![Page 21: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/21.jpg)
22
0
10
20
30
40
50
60
70
80
90
ClickHouse
TimescaleDB
InfluxDB
“Heavy” queries, time in sec
![Page 22: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/22.jpg)
23
How flexible?
![Page 23: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/23.jpg)
24
ClickHouse runs at
Bare metal (any Linux)
Amazon
Azure
VMware, VirtualBox
Docker, K8s
![Page 24: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/24.jpg)
25
ClickHouse solves business problems at:
Mobile App and Web analytics
AdTech bidding analytics
Operational Logs analytics
DNS queries analysis
Stock correlation analytics
Telecom
Security audit
Fintech SaaS
Manufactoring process control
BlockChain transactions analysis
![Page 26: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/26.jpg)
27
Size does not matter
Yandex: 500+ servers, 25B rec/dayLifeStreet: 60 servers, 75B rec/dayCloudFlare: 36 servers, 200B rec/dayBloomberg: 102 servers, 1000B rec/dayToutiao: 400 servers, moving to 1000 this month
![Page 27: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/27.jpg)
28
How fun ☺
life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
![Page 28: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/28.jpg)
29
with (select groupArray(C) from C) as Ca
select id,
groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa,
arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka,
arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] -Sa[k])*(c-Sa[k]),Ca,Ka) Ta,
arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja,
arrayMap(i -> Ta[i], Ja) Ra,
arrayMap((v,r) -> v - r, Va, Ra) ARa,
arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result
from T group by id
![Page 29: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/29.jpg)
30
What’s new in 2018
• Table functions mysql/odbc/file/http
• clickhouse-copier
• Predicate pushdown for views/subselects
• LowCardinality datatype
• Decimal datatype
• JOIN enhancements
• ALTER TABLE UPDATE/DELETE
• WITH ROLLUP
… and tons of performance improvements and small features
![Page 30: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/30.jpg)
31
More user friendly than ever!
• GDPR compliance – thanks to UPDATE/DELETE
• Easier BI integration – thanks to SQL compatibility changes and improvements in ODBC driver
• Easier cluster operation – thanks to clickhouse-copier, distributed DDL
• Easier integration with other systems. Thanks to:• Table functions
• Kafka storage engine
• Logs integration with Logstash, ClickTail
• clickhouse-mysql for migration from MySQL
![Page 31: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/31.jpg)
32
Case Study. Ivinco jump on to ClickHouse
Supports mature boardreader system
A lot of data collected from different sources
A lot of operational data (performance monitoring)
200TB in MySQL!
![Page 32: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/32.jpg)
33
Operational problems
Hard to scale
Hard to make HA solution
Performance issues:• ‘Manual’ partitioning and sharding
• Dozens of indexes per table etc.
![Page 33: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/33.jpg)
34
Organizational problems
No development resources to rewrite
Minimal changes to current system are allowed
![Page 34: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/34.jpg)
35
Binary log replication from MySQL to ClickHouse
MySQL
clickhouse-mysql
Queries
Source Data
See details at: https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
![Page 35: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/35.jpg)
36
Results
Seamless integration of ClickHouse into the current system
No developers/coding involved, project is done with DevOps
Easy to test performance side by side
ClickHouse is 100 times faster
Now ready to re-write main system
![Page 36: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/36.jpg)
37
More ways to integrate with MySQL
• mysql() table function
• MySQL table engine
• MySQL external dictionaries
• ProxySQL
![Page 37: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/37.jpg)
38
mysql() table function
select * from mysql('host:port', database, 'table', 'user', 'password');
https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse
• Easiest and fastest way to get data from MySQL• Load to CH table and run queries much faster
![Page 38: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/38.jpg)
39
MySQL table engine
CREATE TABLE …
Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
•SELECTs and INSERTs!
•No caching, data is queried from remote server
https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
![Page 39: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/39.jpg)
40
MySQL external dictionaries
• Makes data from mysql database accessible in ClickHouse queries
• Stores in memory
• Updates when the source data changes
SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name,
sum(imps)
FROM T
GROUP BY country_name;
![Page 40: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/40.jpg)
41
Accessing ClickHouse from MySQL
![Page 41: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/41.jpg)
42
ClickTail
• Log ingesting based on honeycomb.io• Understands Nginx Access Log, MySQL Slow Log,
MySQL Audit Logs, MongoDB and Regex Custom Format• Easily extensible with other formats
https://github.com/Altinity/clicktail
https://www.altinity.com/blog/2018/3/12/clicktail-introduction
https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/
https://www.percona.com/blog/2018/03/29/analyze-mysql-audit-logs-clickhouse-clicktail/
![Page 42: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/42.jpg)
43
Kafka Engine
Engine = Kafka MVEngine =
MergeTree
https://clickhouse.yandex/docs/en/operations/table_engines/kafka/
ClickHouse
Kafka
![Page 43: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/43.jpg)
44
“Secret” Roadmap disclosed
ANSI SQL JOIN support:• Multi-table joins – Q1/2019
• merge joins – Q2/2019
Protobuf/Parquet formats - Q4/2018
Per column compression/encoding settings – Q4/2018
Dictionary DDLs – Q1/2019
Secondary indexes – Q2/2019
LDAP integration, security enhancements -- Q2/2019
![Page 44: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/44.jpg)
45
ClickHouse Today
Mature Analytic DBMS. Proven by many companies
2+ years in Open Source
Constantly improves – new cool features were added recently
Many community contributors
Emerging eco-system
Support from Altinity
![Page 45: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/45.jpg)
46
ClickHouse Today
![Page 46: ClickHouse 2018 - Percona · M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics](https://reader035.vdocument.in/reader035/viewer/2022070708/5eafc1ea074a61275c1c378d/html5/thumbnails/46.jpg)
47
Q&A
Contact me:
skype: alex.zaitsev
telegram: @alexanderzaitsev
Altinity