apache zeppelin meetup christian tzolov 1/21/16

Post on 16-Jan-2017

469 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

UnifiedData Analytics Platform(with Zeppelin, Ambari, Geode, SpringXD and

HAWQ)

by Christian Tzolov@christzolov

Whoami

Christian TzolovTechnical Architect at Pivotal, BigData, Hadoop, SpringXD,Apache Committer, Crunch PMC member

ctzolov@pivotal.ioblog.tzolov.net@christzolov

Contents• DEMO• Zeppelin Interpreters

• PSQL (to became JDBC in 0.6.x)• Geode• SpringXD

• Apache Ambari • Zeppelin Service • Geode, HAWQ and Spring XD services• Webpage Embedder View

Demo: Twitter Streams with SpringXD, Geode

and HAWQ

Technical Stack

Apache HDFS Data Lake - PHD or HDP HadoopApache HAWQ SQL on Hadoop (OLAP)Apache Geode In-memory data grid (OLTP)Spring XD Integration and Streaming RuntimeApache Ambari Manages All ClustersApache Zeppelin Web UI for interaction with Data Systems

Hadoop/HDFS

Geode HAWQ

SpringXD

Ambari

Zeppelin

Spring XDOrchestrates and automates all steps across multiple data stream pipelines

• HTTP• Tail• File• Mail• Twitter• Gemfire• Syslog• TCP• UDP• JMS• RabbitMQ• MQTT• Kafka• Reactor TCP/UDP

• Filter• Transformer• Object-to-JSON• JSON-to-Tuple• Splitter• Aggregator• HTTP Client• Groovy Scripts• Java Code• JPMML Evaluator• Spark Streaming

• File• HDFS• JDBC• TCP• Log• Mail• RabbitMQ• Gemfire• Splunk• MQTT• Kafka• Dynamic Router• Counters

Apache Geode• Cache - Performance / Consistency /

Resiliency

• Region - Highly available, redundant, distributed Map

China Railway Corporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

Indian Railways7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Apache HAWQ• Built around a Greenplum MPP DB

• 100% ANSI SQL compliant: SQL-92/99/2003…

• ODBC and JDBC

• Hadoop Native: Parquet, HDFS and YARN

• Extensible - Web Tables, PXF

• TPC-DS outperforms Impala by overall 454%

Demo

tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets

geodeTap = tap:stream:tweets > gemfire-json-server --regionName=regionTweet

hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink

tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter

SpringXD Interpreter(s)

• %xd.stream and %xd.job

• Multiple streams or jobs in a paragraph.

• Special Deploy/Launch Semantics

• Zeppelin Dynamic Forms (${…})

• Comprihensive Stream and Job DSL auto-completion (Ctrl+.)

SpringXD Conf

PSQL Interpreter• Prefix: %psql.sql

• PostgreSQL, HAWQ/PXF, Greenplum … JDBC

• PSQL command line shell (via %sh)

• Zeppelin Dynamic Forms (${…})

• Comprihensive SQL/JDBC autocompletion (Ctrl+.)

PSQL Configuration

PSQL Doc

https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/

postgresql.html

PSQL/HAWQ Demo

• http://10.68.58.121:9995/#/notebook/2B2ZYS18Y

Geode Interpreter• Prefix: %geode.oql

• OQL and PDX nested access (user.name)

• Geode command line shell (via %sh)

• Zeppelin Dynamic Forms (${…})

• Basic OQL auto-completion (Ctrl+.)

Geode Configuration

Geode Tutorial

• http://10.68.58.121:9995/#/notebook/2AW57BUN4

Apache AmbariZeppelin, Geode, HAWQ, SpringXD Services …

Ambari Services

Ambari Blueprint

http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint

Webpage Ebedder

https://github.com/tzolov/ambari-webpage-embedder-view

stay in touchctzolov@pivotal.ioblog.tzolov.net@christzolovhttps://nl.linkedin.com/in/tzolov

top related