![Page 1: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/1.jpg)
Presto Summit Series: PinterestPresto at Pinterest
August 19, 2020
![Page 2: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/2.jpg)
Introduction
2
![Page 3: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/3.jpg)
Presto at Pinterest Pucheng Yang Yi He
![Page 4: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/4.jpg)
Presto at Pinterest
● Overview of Presto at Pinterest and the technical challenges● Leveraging warning systems for users to write better queries● Managing diverse workloads● Future work
![Page 5: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/5.jpg)
Overview of Presto @ Pinterest
![Page 6: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/6.jpg)
Scale at Pinterest
● Business Scale○ 400M+ MAUs○ 200B+ Pins○ 4B+ Boards
● Data Scale○ 400+ PB @ S3○ Peak of 80k Hadoop jobs per day○ 10,000+ Hive/Hadoop nodes○ > 500 Presto workers (dedicated nodes + k8s)○ > 110,000 Hive Tables
● Everything in Cloud(AWS)
![Page 7: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/7.jpg)
Evolution
🏠Hadoop🏠Presto (RO) +🏠Spark +🏠Hive + HMS (RW)
2018
Qubole +Redshift +🏠Hadoop +🏠Presto + HMS(RO) +🏠Spark
2016 Q4
Qubole +Redshift +🏠Hadoop
Better SupportHappy Security Team
2016 Q2
Qubole +Redshift +🏠Hadoop +🏠Presto + HMS(RO)
Better Support
2016 Q3
Qubole +Redshift
Better Support Qubole:- MR- HMS- Hive- Spark
2014
🏠Hadoop🏠Presto (RO) +🏠Spark +🏠Hive + HMS (RW)
2018
🏠Hadoop🏠Presto (RW) +🏠Spark +🏠Hive + HMS (RW) +🏠Spark SQL🏠Flink
2020
Better Support Better Support
![Page 8: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/8.jpg)
Presto at Pinterest
![Page 9: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/9.jpg)
Presto at Pinterest
![Page 10: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/10.jpg)
Presto at Pinterest
![Page 11: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/11.jpg)
Clusters Overview
● PrestoSQL 320 with some backports/ customized changes● Connectors: Hive (major), MySQL, Druid, Thrift● Adhoc workers running on k8s pod
Use case # of cluster Cluster size Coordinator Worker
adhoc 2 200
64 core, 488G 43 ~ 48 core, 340 ~ 384Gscheduled 1 165
Pii and others 2 30 ~ 100
![Page 12: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/12.jpg)
Presto Controller
● An in-house service critical to our Presto deployments, monitoring and self-healing etc.
● Following major functionalities are served by the controller.1. Health check2. Slow worker detection3. Heavy query detection4. Rolling restarts of Presto clusters5. Scaling clusters
![Page 13: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/13.jpg)
Presto Gateway
● A service that sits between clients and Presto clusters● It essentially is a smart http proxy server● Stateless, easy to scale● Makes clients agnostic of specific Presto clusters and enables the following.
1. Queries routing based on rules/health/load/resource groups2. Resource usage visibility for users/ orgs3. Overall Presto clusters’ health visibility
![Page 14: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/14.jpg)
Presto Gateway
![Page 15: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/15.jpg)
Challenges: Deeply Nested Large Thrift Schemas
● Prime reason for coordinator getting stuck/ crashes● Example: A popular and commonly used large Thrift schema has over 12
million primitives and a depth of 41 levels. This schema when serialized to string takes over 282 MB.
● Close to 500 hive tables with over 100K primitives in their schemas.● Coordinator fetches table schema from Hive Metastore and then serialize that
schema in each task request it sends to workers.○ Keeps Hive Metastore service from getting bombarded with requests from workers.○ Adverse effect on coordinator memory and network when schemas are very large.
![Page 16: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/16.jpg)
Challenges: Deeply Nested Large Thrift Schemas
● Our large and deeply nested schemas issue is only limited to tables using Thrift schemas.
● Thrift schema Java archive (jar) file is created and put into the classpaths of coordinator and each worker of a Presto cluster and is loaded at service start time.
● Completely got rid of schemas from tasks’ requests: instead, only a Thrift class name is passed as part of the request.
● Workers uses thrift schema jar to construct table schema.
![Page 17: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/17.jpg)
Challenges: Inconsistent serdes/ schemas jar versions between coordinator and workers● Presto process always load latest serdes/ schemas jars from s3 when restart ● Loaded jars between coordinator & workers could potentially out of sync
when one process restart but the other one does not● Solution: version the jars and include version in the node info, broadcast
coordinator node version to all workers, restart worker if jar version not matched and pull the right version jars.
![Page 18: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/18.jpg)
Warning systems for better query authoring and diagnosing
(Dr. Presto Project)
![Page 19: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/19.jpg)
Presto Warnings
● Users sometimes are not aware of writing inefficient queries, use warnings system to deliver our recommendations
Use Cases Example Warnings
Query Authoring
Replace “count(distinct x)” with approx_distinct
Large result set / Wide output columns
Missing partition predicate
Query Diagnosing
High CPU consumption, wrong resource group config
Scanning huge non columnar tables
Wrong join order/ type
Performance analysis etc
![Page 20: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/20.jpg)
Presto Warnings
![Page 21: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/21.jpg)
Managing diverse workloads
![Page 22: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/22.jpg)
What we are facing?
● Traffic varies between business/ non business hour○ Y-axis: # of concurrent running queries○ X-axis: datetime (range in a day)○ Red line: scheduled queries, green line: adhoc query
![Page 23: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/23.jpg)
What we are facing?
● Previously resource limit on per user per client○ # of queries allowed is 3○ No cpu quota
● Various query type, from resource intensive -> resource lightweight● Due to the nature of the adhoc usage, compute demand changes very fast
![Page 24: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/24.jpg)
How to solve it
Query traffic variance ● Route traffic between adhoc and scheduled cluster, i.e. route scheduled
queries to adhoc cluster during off peak hours● Publish query traffic pattern to users via warnings to let user reschedule job if
possible
![Page 25: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/25.jpg)
How to solve it
Improve resource usage & developer velocityOrg-based resource group
● Have one single resource group for whole organization (various from 20 ~ 100 users) based on LDAP.
● Provide visibility for which user/ query is taking majority resource in the group● Each resource group will have resource sub-groups for allocated with different
resources: fast_lane, normal and expensive. User could choose which one to use via setting session property.
![Page 26: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/26.jpg)
How to solve it
![Page 27: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/27.jpg)
Future Work
● Spot instance● More Presto warnings● Fine grained access control● Better query failure diagnostics● Presto for ETL workloads
![Page 28: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/28.jpg)
© Copyright, All Rights Reserved, Pinterest Inc. 2017
Template // Jan 2017
Presto at Pinterest Blog: https://medium.com/pinterest-engineering/presto-at-pinterest-a8bda7515e52Presto at Pinterest NYC Presto Summit (2019): https://www.youtube.com/watch?v=AY7VtreK8IQ
![Page 29: Presto Summit Series: Pinterest · Scale at Pinterest Business Scale 400M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes](https://reader033.vdocument.in/reader033/viewer/2022051900/5fef7600b5903e6d6709808e/html5/thumbnails/29.jpg)
THANK YOU!
Register: www.starburstdata.com/events