netbeans for big data

| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend1

| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend

BigData with Free and Open Source Tools

2

NetBeans IDE for BigData Development with Apache SparkJohannes Weigend - QAware GmbH


About this TalkA brief overview about BigData Processing (10 Minutes) Live Demo: Apache Zeppelin and Spark (5 Minutes) Spark Programming with NetBeans (10 Minutes)

3


Horizontal Scalability is Difficult!■ Horizontal scalability of functions■ Trivial ■ Loadbalancing of (stateless) services (makro- / microservices) ■ More users ! more machines

■Non trivial ■ More machines ! faster response times

■ Horizontal scalability of data■ Trivial■ Linear distribution of data on multiple machines ■ More machines ! more data

■Non trivial ■ Constant response times with growing datasets

4


Hadoop Gives Answers to Horizontal Scalability of Data and Functions

5


■ Distributed computing (100x faster than Hadoop (M/R)■ Distributed Map/Reduce on distributed data can be done in-memory ■Written in Scala (JVM)■ Java/Scala/Python APIs■ Processes data from distributed and non-distributed sources■Textfiles (accessible from all nodes)■Hadoop File System (HDFS)■Databases (JDBC)■Solr per Lucidworks API■ ...

7

READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf




Cluster

JVM

Worker

Worker

JVM

JVM

JVM

Worker

Master / Yarn / MesosJVM

Executor

Executor

JVM

JVM

JVM

Executor

start

start

start

TaskTask(s)

Slave

Slave

Slave

Master Host

Spark Context

MasterURL

Resilient Distributed

Dataset RDD

Driver Node

creates

Driver Application

Application

uses

Partition

Task(s)

Partition

Task(s)

Partition


Apache Spark - Lambda on Steroids

9

| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend10

„Put the Cloud in a Box“


Cloud Case – 5x Intel NUC6i5SYK

11

6th generation Intel® Core™ i5-6260U processor with Intel® Iris™ graphics (1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB Cache, 15W TDP)

CPU

32 GB Dual-channel DDR4 SODIMMs 1.2V, 2133 MHz

RAM

256 GB Samsung M.2 internal SSDDISK

! This case is as powerful as five notebooks

10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal


LogFile Analysis with Apache Spark and NetBeans

■DEMO- Getting Started with Spark Programming in NetBeans

- Working with Gradle projects and code completion- Using a real cluster (The cloud case)

- Working with the remote terminal- Using the embedded browser

- Using Docker- Connect to a remote Docker Engine- Using container logs

12


Spark Pattern 1: Distributed Task with Params

14


Spark Pattern 2: Distributed Read from External Sources

15


Spark Pattern 3: Caching and Further Processing with RDDs

16

netbeans for big data

Data & Analytics