big data analytics for non programmers

13
wwww.edureka.co/big-data- and-hadoop Big Data Analytics for Non- Programmers

Upload: edureka

Post on 07-Jan-2017

3.331 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Big Data Analytics for Non-Programmers

Page 2: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Agenda for the day

Can Hadoop be learnt without knowing Java? How Pig can be used in place of MapReduce ? Querying data with HiveQL

Page 3: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Can Hadoop be learnt without knowing Java?

Page 4: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

YES !!

Hadoop can be learnt without knowing Java

Page 5: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Pig & Hive

Tools like Pig and Hive that are built on top of Hadoop, offer high-level languages for working with data

If you want to write MapReduce program, then you can use Pig and Pig Latin for which knowledge of Java is not required.

If you want to view data in HDFS in a readable form you can use Hive which again does not require any knowledge of Java.

Page 6: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Why Pig?

Page 7: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

But why Pig?

Pig simplifies complex MapReduce programs by using Pig Latin

Additionally, If you want to write your own MapReduce code, you can do so in any language (e.g. Perl, Python, Ruby, C, etc.)

But the most attractive features of Pig are:

10 lines of PIG = 200 lines of Java

Built in operations like:

Join Group Filter Sort and more…

Page 8: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Why Pig?

® Provides common data operations filters, joins, ordering, etc. and nested data types tuples, bags, and maps missing from MapReduce.

® It is Open source and is actively supported by a community of developers.

Structured data

Semi-Structured data

Unstructured data

Similar to SQL

Reads like a series of steps

JavaPythonJavaScriptRuby

® An ad-hoc way of creating and executing map-reduce jobs on very large data sets

® Can take any data

®Easy to learn, Easy to read and write

® Extensible by UDF (User Defined Functions)

® Java not required

Page 9: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Why Hive?

Page 10: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Why Hive?

Defines SQL-Like

Query Language

called HiveQL

DataWarehouse

Infrastructure

Allows programmers to plug-in custom mappers and

reducers

Provides tools to enable easy

ETL

Page 11: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Features of Hive

You can use HIVE to read and write files on Hadoop and run your reports from a BI tool

Predictive Modeling & Hypothesis Testing

Document Indexing

Customer-facing Business Intelligence

Log Processing

Data Mining

HIVEApplications

Page 12: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Demo

Page 13: Big Data Analytics for Non Programmers

wwww.edureka.co/big-data-and-hadoop

Thank You

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours