java bigdata full stack development (version 2.0)

Post on 11-Apr-2017

378 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Java BigData Full Stack

Development as is ...

Alexey Zinovyev, Java Trainer in EPAM

About

With IT since 2007

With Java since 2009

With Hadoop since 2012

With EPAM since 2015

3Java Big Data Full Stack Development

Contacts

E-mail : Alexey_Zinovyev@epam.com

Twitter : @zaleslaw @BigDataRussia

vk.com/big_data_russia Big Data Russia

vk.com/java_jvm Java & JVM langs

4Java Big Data Full Stack Development

The Good Old Days

5Java Big Data Full Stack Development

HRs & RMs are looking for Java developers

6Java Big Data Full Stack Development

Is Java Dream Team waiting You?

7Java Big Data Full Stack Development

Required Skills

• Advanced SQL

• Basic Linux

• Core Java & JVM

• Backend Development Experience

• Basic Computer Science Level

8Java Big Data Full Stack Development

REAL WORLD

9Java Big Data Full Stack Development

Let’s just use Javascript in frontend ONLY

10Java Big Data Full Stack Development

In frontend

ONLY?

11Java Big Data Full Stack Development

Cruel world

12Java Big Data Full Stack Development

Do you know ML JS library?

13Java Big Data Full Stack Development

Wild animals everywhere

14Java Big Data Full Stack Development

And what I tell you

15Java Big Data Full Stack Development

And what I tell you

16Java Big Data Full Stack Development

It’s Time for Java Superhero, yeah!

17Java Big Data Full Stack Development

Before patterns discovering you should ..

• Select small pieces

• Define default values for missed

data

• Remove strange signals from data

• Merge some tables in one if

required

18Java Big Data Full Stack Development

How it really works

• Share your date with us

• Our magic manipulations

• Building an answering machine

• PROFIT!!!

19Java Big Data Full Stack Development

How to start?

20Java Big Data Full Stack Development

21Java Big Data Full Stack Development

WHAT IS BIG DATA?

22Java Big Data Full Stack Development

Joke about Excel

23Java Big Data Full Stack Development

5V

24Java Big Data Full Stack Development

Every 60 seconds…

25Java Big Data Full Stack Development

From Mobile Devices

26Java Big Data Full Stack Development

From Industry

27Java Big Data Full Stack Development

We started to keep and handle stupid new things!

28Java Big Data Full Stack Development

10^6 rows

in MySQL

29Java Big Data Full Stack Development

GB->TB->PB->?

30Java Big Data Full Stack Development

Is BigData about PBs?

31Java Big Data Full Stack Development

Is BigData about PBs?

32Java Big Data Full Stack Development

It’s hard to …

• .. store

• .. handle

• .. search in

• .. visualize

• .. send in network

33Java Big Data Full Stack Development

Likes in Classmates: how to count?

34Java Big Data Full Stack Development

Crazy Zoo

2012

35Java Big Data Full Stack Development

Crazy Zoo

2016

36Java Big Data Full Stack Development

What will be

lighted this

training

37Java Big Data Full Stack Development

NOSQL

38Java Big Data Full Stack Development

What’s the problem with RBDMS’s

• Caching

• Master/Slave

• Cluster

• Table Partitioning

• Sharding

39Java Big Data Full Stack Development

Family

40Java Big Data Full Stack Development

Database

party

41Java Big Data Full Stack Development

Spring Data

42Java Big Data Full Stack Development

How to start?

43Java Big Data Full Stack Development

Java MongoDB Driver + Robomongo

44Java Big Data Full Stack Development

BIG DATA TOOL MASTER

VS

DATA SCIENTIST

45Java Big Data Full Stack Development

TRAIN

MODEL

46Java Big Data Full Stack Development

Datasets

• Facebook users, tweets

• Trade transactions

• Government

• Medicine (genomic data)

• Telecommunications

47Java Big Data Full Stack Development

Data Sources

• Relational Databases

• Data warehouses (Historical data)

• Files in CSV or in binary format

• Internet or electronic mails

• Scientific, research (R, Octave,

Matlab)

48Java Big Data Full Stack Development

Hey, man, predict something!

49Java Big Data Full Stack Development

Man or sofa?

50Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

51Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

52Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

• What is the revenue prediction for next year?

53Java Big Data Full Stack Development

Typical questions for DM

• Which loan applicants are high-risk?

• How do we detect phone card fraud?

• What is the revenue prediction for next year?

• Can you recommend music for users?

54Java Big Data Full Stack Development

Green circle is blue square or red

triangle? Let’s ask its neighbors!

kNN (k-nearest neighbor)

55Java Big Data Full Stack Development

Collaborative Filtering

56Java Big Data Full Stack Development

Machine Learning vs Traditional Programming

57Java Big Data Full Stack Development

Data

Science

58Java Big Data Full Stack Development

Can a Java programmer to be a Data Scientist?

59Java Big Data Full Stack Development

Sexy Data Scientist

60Java Big Data Full Stack Development

Real Data Scientist

61Java Big Data Full Stack Development

How to start?

62Java Big Data Full Stack Development

Weka

63Java Big Data Full Stack Development

HADOOP

64Java Big Data Full Stack Development

Hadoop and Data Knights

65Java Big Data Full Stack Development

Hadoop

66Java Big Data Full Stack Development

MapReduce in different languages

67Java Big Data Full Stack Development

MapReduce for WordCount

68Java Big Data Full Stack Development

Hadoop

Jobs

69Java Big Data Full Stack Development

Hadoop frameworks

• Universal (MapReduce, Tez, RDD in Spark)

• Abstract (Pig, Pipeline Spark)

• SQL - like (Hive, Impala, Spark SQL)

• Processing graph (Giraph, GraphX)

• Machine Learning (Mahout, MLib)

• Stream processing (Spark Streaming, Storm)

70Java Big Data Full Stack Development

SPARK

71Java Big Data Full Stack Development

SPARK: the bloody son of MR

• MapReduce in memory

• Up to 50x faster than Hadoop

• RDD is a basic building block

(immutable distributed

collections of objects)

• Pipeline API (no needs in PIG)

72Java Big Data Full Stack Development

Spark

Family

73Java Big Data Full Stack Development

MLlib supports

• Classification and regression

• Collaborative filtering

• Clustering

• Dimensionality reduction

• Optimization

74Java Big Data Full Stack Development

Code sample MLlib (K-Means)

// Cluster the data into two classes using KMeans

int numClusters = 2;

int numIterations = 20;

KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

// Evaluate clustering by computing Within Set Sum of Squared Errors

double WSSSE = clusters.computeCost(parsedData.rdd());

System.out.println("Within Set Sum of Squared Errors = " + WSSSE);

// Save and load model

clusters.save(sc.sc(), "myModelPath");

KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");

75Java Big Data Full Stack Development

MLlib

• .. extends scikit-learn (Python lib) and Mahout

• .. runs fully on Spark and supports Spark’s Pipeline API

• .. dataset is represented by Spark SQL’s SchemaRDD

• .. supports Hive like external data source

• .. is well for large datasets and parallelized algorithms

76Java Big Data Full Stack Development

It solves all problems!

77Java Big Data Full Stack Development

How to start?

78Java Big Data Full Stack Development

HDP Zoo

79Java Big Data Full Stack Development

Ok, Google!

80Java Big Data Full Stack Development

AWS Amazon

81Java Big Data Full Stack Development

Infrastructure issues are waiting YOU!

82Java Big Data Full Stack Development

DEEP LEARNING

83Java Big Data Full Stack Development

Deep Learning help us build NEW FUTURE

84Java Big Data Full Stack Development

Deep Learning help us build NEW FUTURE

85Java Big Data Full Stack Development

HOW TO LEARN?

86Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

DIFFERENT WAYS

87Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

DIFFERENT WAYS

88Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

DIFFERENT WAYS

89Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

4. Take a training course

DIFFERENT WAYS

90Java Big Data Full Stack Development

1. Read books and write ‘pet’ projects

2. Become a mentee in Mentoring Process

3. MOOC

4. Take a training course

5. Visit conferences

DIFFERENT WAYS

91Java Big Data Full Stack Development

Recommended Books

92Java Big Data Full Stack Development

Contacts

E-mail : Alexey_Zinovyev@epam.com

Twitter : @zaleslaw @BigDataRussia

vk.com/big_data_russia Big Data Russia

vk.com/java_jvm Java & JVM langs

top related