apache zeppelin + livy: bringing multi tenancy to interactive data analysis
TRANSCRIPT
![Page 1: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/1.jpg)
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin + Livy: Bringing Multi Tenancyto Interactive Data AnalysisRohit Choudhary & Jeff ZhangJune 28, 2016
![Page 2: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Web-based notebook that enables
interactive data analytics.
You can make beautiful data-driven,
interactive and collaborative
documents with SQL, Scala and more
What’s Apache Zeppelin?
![Page 3: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 1.0 (Spark-shell)
![Page 4: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 2.0 (Zeppelin)
Spark Interpreter
![Page 5: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 3.0 (Zeppelin + Livy)
Livy Interpreter
![Page 6: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Activity
![Page 7: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Quick Stats: Zeppelin
Zeppelin graduated in May 2016 and is now TLP Incubated by Apache Foundation, since Dec- 2014 9 Committers, 120+ contributors, growing list 1000+ JIRAs filed 900 PRs via the community Zeppelin just got a new friend “R”
![Page 8: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recent Updates
Multi-tenancy with Livy Generic JDBC Interpreter
– Hive, Phoenix , RedShift – Postgres, MySql– Several others
Notebook Authentication and Authorization UI Automation through Selenium Security for other interpreters (on its way)
![Page 9: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Usage Patterns & Feedback Cluster monitoring, memory analysis Telecom data usage, Concert attendees travel patterns
![Page 10: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Upcoming GA with HDP 2.5 & Ambari 2.4.0, ETA – End July
![Page 11: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture & Usage
![Page 12: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
Current Interpreter Support HDFS PySpark, SparkR, Spark Hive, Phoenix, SQL Shell …
![Page 13: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Features
Collate/Load Data
Collate/Load data from existing data sources, load from external CSVs. i.e. Eureka, Smartsense
Visualize Robust visualization mechanism to visualize data, and enable insights
Collaborate Notebook base collaboration, export Notebooks, soon to be added, tagging to Notebook generated data
![Page 14: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Popular Usage Scenarios
Customized Dashboards
Intended for usage towards customized dashboards for Big Data clusters
Security Analytics
Understanding nature of data coming through multiple sources and analyzing the effects of it
Bio-sciences Medical research companies are interested in using this for their research
![Page 15: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Bringing Multi-tenancy to Zeppelin
![Page 16: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-Tenancy: Motivation
Supporting workloads of multiple customers
Supporting multiple LOBs (lines of business), on a single data systems
Support fine grained audits
Inability to provision capacity for multiple user groups
Inability to Audit user actions, as all jobs are run via ‘zeppelin’ proxy user
Inability to share state/data with other users as well
Objectives Requirements
![Page 17: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Livy Interaction
LDAP
Zeppelin
Shiro
Spark
Yarn
Livy
Ispark GroupInterpreter
SPNego: Kerberos Kerberos
Security Across Zeppelin-Livy-Spark
Livy APIs
![Page 18: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep dive on Livy
![Page 19: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/19.jpg)
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Livy
Livy ServerLivy Client
Http
Http (RPC)
Http (RPC)
Livy is an open source REST interface for interacting with Spark from anywhere.
Spark Interactive Session
SparkContext
Spark Batch Session
SparkContext
![Page 20: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/20.jpg)
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why we need Livy with Zeppelin
Reduce the pressure on client machine
Make the job submission/monitoring easy
Customize the job schedule
![Page 21: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/21.jpg)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Session – Create Session
21
3
4
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions
{"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]}
Request
Response
Livy Client
Livy Server
Spark Interactive Session
SparkContext
![Page 22: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/22.jpg)
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Session – Execute Code
{"id":0,"state":"running","output":null}
Request
Response
curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}'
21
3
4
Livy Client
Livy Server
Spark Interactive Session
SparkContext
![Page 23: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/23.jpg)
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SparkContext Sharing
Livy Server
Client 1
Client 2
Client 3
Session-1
Session-1
Session-2 Session-2
Session-1SparkSession-1
SparkContext
SparkSession-2
SparkContext
![Page 24: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/24.jpg)
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Security
Client Livy Server(Impersonation)
Shared SecretSpengoSparkSession
• Only authorized users can launch spark session / submit code
• Each user can access his own session
• Only Livy server can submit job securely to spark session
![Page 25: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/25.jpg)
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SPNEGO
Client(Kerbrose TGT)
Livy Server(SPENGO enabled)
Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”
It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology.
Http Get http://site/a.html
Error 401 Unauthorized
Http Get Request Authorization: Negotiation
Http Get Request
![Page 26: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/26.jpg)
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Impersonation
Alice(Kerberos TGT)
Shared Secret
Bob(Kerberos TGT)
Shared SecretSpengo
Spengo
Livy Server(super user: livy)
Spark Session
Spark Session
![Page 27: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/27.jpg)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Shared Secret
1. Livy Server generate secret key
2. Livy Server pass secret key to spark session when launching spark session
3. Use the secret key to communicate with each other
Spark SessionShared Secret
Livy Server
![Page 28: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/28.jpg)
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenant: Zeppelin Demo
![Page 29: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/29.jpg)
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Direction
Workspaces and Collaboration Customizable Visualization
– Helium– Custom, data type based visualization (Geolocation/Maps)
Enterprise Readiness– Bring security to all interpreters– Performance improvements
Collaboration Data Lineage
![Page 30: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/30.jpg)
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A
![Page 31: Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022042907/586f74f11a28ab10258b5de9/html5/thumbnails/31.jpg)
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You