Hello, Enterprise! Meet PrestoTeradata Contributions to Presto
10/6/15Christina Wallin
2
• Teradata Center for Hadoop• Formerly Hadapt, the first SQL-on-Hadoop company (founded in
2010)• Offices in Boston and Warsaw, some remote employees in CA and CT• Around 20 employees working on Presto• Contributors to the open source project Presto!
Who are we?
3
What is Presto?• 100% open source distributed ANSI SQL engine for Big Data
– Modern architecture and implementation– Proven scalability and performance– Optimized for low latency, interactive querying
• Cross platform query capability, not only SQL on Hadoop• Distributed under the Apache license, now supported by Teradata• Used by a community of well known, well respected technology companies
4
Presto Architecture
Coordinator
Parser/analyzer Planner Scheduler
Worker
Client
Worker
Worker
5
Presto Pluggable Data sources Capabilities
Push-down to Hadoop System Push-down to Other Database
HADOOP HDFSOTHER
DATABASES
HADOOP KAFKA
Hadoop
HADOOP PRESTO
Push-down to NoSQL Databases
NOSQLDATABASES
6
Teradata Contributions to Presto
Implement Integrate Proliferate• Installer• Documentation• Monitoring & Support
Tools
• Management Tool Integration
• YARN Integration ODBC Driver
• JDBC Driver• BI Certification• Security• Connectors
Commercial Support
Phase 1
Phase 2
Phase 3June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
7
Easy Installation and Administration
8
• presto-admin can:– Install and uninstall Presto– Deploy configuration files across the cluster– Start/stop/restart Presto servers– Show you the status of the cluster– Add and remove connectors– Upgrade Presto to a different version– Collect logs, query info, system info for support
• Additionally, we added an RPM for Presto• https://github.com/prestodb/presto-admin
presto-admin: a tool to manage and install Presto
9
Hadoop Ecosystem Integration
10
Ambari Integration (Work In Progress)• http://github.com/prestodb/ambari-presto-service
11
12
13
14
15
Resource Allocation with YARN• Slated for Q4 2015• Allow Presto to run its services within YARN containers so that YARN
knows about memory/CPU allocated to Presto.– Using Apache Slider– The allocation is fixed and upfront– Supports HDP and CDH Hadoop Versions
• YARN CGroups Integration• http://github.com/prestodb/presto-yarn
16
Enterprise Database Features
17
• Improved ODBC driver -- Q4 2015• Improved JDBC driver -- Q1 2016• Certification against Tableau, Qlik, etc. – mid 2016
Unleashing Presto on Business Intelligence Tools
18
• Current Contributions– DECIMAL type (WIP)– Additional smaller things – new functions, bug fixes, TIMESTAMP support for
Parquet• Future goal: Support TPC-H and TPC-DS unmodified!– Additional subquery and join support– EXISTS, EXCEPT, INTERSECT– Various other odds and ends
Expanded ANSI SQL Support
19
Demo of presto-admin!
20
• https://github.com/facebook/presto• https://github.com/prestodb/presto-admin• Certified distro: http://www.teradata.com/presto/– Also can download VM images pre-installed with Presto
How can I give Presto a try?
21
Questions?
22