boston hadoop meetup: presto for the enterprise
Post on 14-Aug-2015
373 Views
Preview:
TRANSCRIPT
2
• History of Teradata Center for Hadoop
– Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi
– Pioneered SQL-on-Hadoop market
– Based on work done by database research group in Yale Computer Science Department
– Hybrid of Hadoop scalability and DBMS performance
• Today
– Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
– 30 developers with deep Hadoop and database expertise
– Headquarters in Boston, MA
– Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3
• What is Presto?
• What is Teradata doing?
• Can I see a Demo?
• How can I contribute?
Talk Agenda
4
• 100% open source distributed ANSI SQL engine for Big Data – Modern code base
– Proven scalability
– Optimized for low latency, Interactive querying
• Cross platform query capability, not only SQL on Hadoop
• Distributed under the Apache license, now supported by Teradata
• Used by a community of well known, well respected technology companies
What is Presto?
5
History of Presto
FALL 2012 4 developers start Presto
development
FALL 2014 88 Releases
41 Contributors 3943 Commits
SPRING 2015 98 Releases
65 Contributors
4587 Commits
--------- Teradata joins
Presto community & offers support
SPRING 2013 Presto rolled out within Facebook
FALL 2013 Facebook open sources Presto
FALL 2008 Facebook
open sources Hive
Timeline image courtesy of Facebook
6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata API
Parser/ analyzer
Planner Scheduler
Worker
Client
Data location API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7
Presto Extensibility – connectors
Parser/ analyzer
Planner
Worker
Data location API
Hiv
e
Ca
ssa
nd
ra
Kafk
a
MyS
QL
…
Metadata API
Hiv
e
Ca
ssa
nd
ra
Kafk
a
MyS
QL
…
Data stream API
Hiv
e
Ca
ssa
nd
ra
Kafk
a
MyS
QL
…
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8
• Data stays in memory during execution and is pipelined across nodes MPP-style
• Vectorized columnar processing
• Presto is written in highly tuned Java
– Efficient in-memory data structures
– Very careful coding of inner loops
– Bytecode generation
• Optimized ORC reader
Presto = Performance
9
• Facebook – Multiple production clusters (100s of nodes total) - Including 300PB Hadoop data warehouse
– 1000s of internal daily active users
– Millions of queries each month
– Multiple PBs scanned every day
– Trillions of rows a day
• Netflix – Over 200-node production cluster on EC2
– Over 15 PB in S3 (Parquet format)
– Over 300 users and 2.5K queries daily
Presto in Production
10
• 100% open source contributions to Presto to increase adoption in the enterprise
• A multi-year roadmap commitment to phased enhancements of the open source code
• The first ever commercial support offering for Presto
What is Teradata Doing?
Teradata Certified Presto www.teradata.com/presto
11
• Hadoop Distro Agnostic
• Modern Code Base – Presto is well-designed open source software with proper database
architecture
• Strong Like-Minded Community
• Push down processing across multiple data platforms
• Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
13
Implement Integrate Proliferate
• Installer • Documentation • Monitoring & Support
Tools
• Management Tool Integration
• YARN Integration
• ODBC / JDBC Drivers • BI Certification • Security • Connectors
Commercial Support
Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
14
• Ease of install and management via Presto-Admin tool – www.github.com/prestodb/presto-admin
– Packaging Presto as an RPM
• Testing Framework for Presto – www.github.com/prestodb/tempto
– Added large number of tests
• Improvements to JDBC driver – To be open sourced on www.github.com/prestodb soon!
• Various SQL improvements
Teradata’s Contributions
15
• YARN Integration
• Ambari Integration
• ODBC & JDBC Drivers that actually work
• Security – Authentication & Authorization
• Continued SQL Improvements
• BI tool certifications – e.g. Tableau
• More Connectors – e.g. Hbase
• Open Source our Docker based Dev Env
• Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap
16
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?
17
Available for Download – Presto 101t Server, CLI, JDBC
– Presto-Admin 0.1
– Documentation
– HDP w/ Presto VM Sandbox
– CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata
top related