netbeans for big data
TRANSCRIPT
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
BigData with Free and Open Source Tools
2
NetBeans IDE for BigData Development with Apache SparkJohannes Weigend - QAware GmbH
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
About this TalkA brief overview about BigData Processing (10 Minutes) Live Demo: Apache Zeppelin and Spark (5 Minutes) Spark Programming with NetBeans (10 Minutes)
3
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Horizontal Scalability is Difficult!■ Horizontal scalability of functions■ Trivial ■ Loadbalancing of (stateless) services (makro- / microservices) ■ More users ! more machines
■Non trivial ■ More machines ! faster response times
■ Horizontal scalability of data■ Trivial■ Linear distribution of data on multiple machines ■ More machines ! more data
■Non trivial ■ Constant response times with growing datasets
4
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Hadoop Gives Answers to Horizontal Scalability of Data and Functions
5
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
■ Distributed computing (100x faster than Hadoop (M/R)■ Distributed Map/Reduce on distributed data can be done in-memory ■Written in Scala (JVM)■ Java/Scala/Python APIs■ Processes data from distributed and non-distributed sources■Textfiles (accessible from all nodes)■Hadoop File System (HDFS)■Databases (JDBC)■Solr per Lucidworks API■ ...
7
READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Cluster
JVM
Worker
Worker
JVM
JVM
JVM
Worker
Master / Yarn / MesosJVM
Executor
Executor
JVM
JVM
JVM
Executor
start
start
start
TaskTask(s)
Slave
Slave
Slave
Master Host
Spark Context
MasterURL
Resilient Distributed
Dataset RDD
Driver Node
creates
Driver Application
Application
uses
Partition
Task(s)
Partition
Task(s)
Partition
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Apache Spark - Lambda on Steroids
9
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend10
„Put the Cloud in a Box“
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Cloud Case – 5x Intel NUC6i5SYK
11
6th generation Intel® Core™ i5-6260U processor with Intel® Iris™ graphics (1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB Cache, 15W TDP)
CPU
32 GB Dual-channel DDR4 SODIMMs 1.2V, 2133 MHz
RAM
256 GB Samsung M.2 internal SSDDISK
! This case is as powerful as five notebooks
10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
LogFile Analysis with Apache Spark and NetBeans
■DEMO- Getting Started with Spark Programming in NetBeans
- Working with Gradle projects and code completion- Using a real cluster (The cloud case)
- Working with the remote terminal- Using the embedded browser
- Using Docker- Connect to a remote Docker Engine- Using container logs
12
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 1: Distributed Task with Params
14
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 2: Distributed Read from External Sources
15