introduction to big data analytics on apache hadoop

45

Upload: avkash-chauhan

Post on 28-Nov-2014

258 views

Category:

Data & Analytics


4 download

DESCRIPTION

In the age of Big Data and large volume analytics there is a lot to cover and a lot to learn. While at Microsoft developing Windows HDInsight and now developing a one of kind Big Data product at my own company Big Data Perspective, San Francisco I have lived last several years covering Big Data at various level. This talk is customized for database and business intelligence (BI) professionals, programmers, Hadoop administrators, researchers, technical architects, operations engineers, data analysts, and data scientists understand the core concepts of Big Data Analytics on Hadoop. This webinar will be useful for those, who wants to know what is Hadoop, and how they can take advantage just by spending few dollars to run the cluster. The webinar is great for those who are looking to deploy their first data cluster and run MapReduce jobs to discover insights.

TRANSCRIPT

Page 2: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big

Data

Page 3: Introduction to Big Data Analytics on Apache Hadoop
Page 4: Introduction to Big Data Analytics on Apache Hadoop
Page 5: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and

Define Big Data

How Hadoop

Fits in this scenario

Page 6: Introduction to Big Data Analytics on Apache Hadoop

http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx

https://www.linkedin.com/in/avkashchauhan

Page 7: Introduction to Big Data Analytics on Apache Hadoop
Page 8: Introduction to Big Data Analytics on Apache Hadoop

Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage

& processing, distributed across machines.

Page 9: Introduction to Big Data Analytics on Apache Hadoop

Flexibility A Single Repo for

storing and analyzing any kind of data not bounded by schema

Scalability Scale-out architecture

divides workload across multiple nodes using flexible

distributed file system

Low Cost Deployed on commodity

hardware & open source platform

Fault Tolerant Continue working event if node(s) go

down

Page 10: Introduction to Big Data Analytics on Apache Hadoop

A system to move computation, where the data is.

Page 11: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop

Fits in this scenario

Hadoop Landscape

Page 12: Introduction to Big Data Analytics on Apache Hadoop
Page 13: Introduction to Big Data Analytics on Apache Hadoop
Page 14: Introduction to Big Data Analytics on Apache Hadoop
Page 15: Introduction to Big Data Analytics on Apache Hadoop
Page 16: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core

Components

Page 17: Introduction to Big Data Analytics on Apache Hadoop

Data Storage

Data Processing

Page 18: Introduction to Big Data Analytics on Apache Hadoop
Page 19: Introduction to Big Data Analytics on Apache Hadoop

Hadoop Common

HDFS MapReduce

/YARN

Page 20: Introduction to Big Data Analytics on Apache Hadoop

Cloud

Page 21: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Page 22: Introduction to Big Data Analytics on Apache Hadoop
Page 23: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Page 24: Introduction to Big Data Analytics on Apache Hadoop
Page 25: Introduction to Big Data Analytics on Apache Hadoop
Page 26: Introduction to Big Data Analytics on Apache Hadoop
Page 27: Introduction to Big Data Analytics on Apache Hadoop
Page 28: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Page 29: Introduction to Big Data Analytics on Apache Hadoop
Page 30: Introduction to Big Data Analytics on Apache Hadoop
Page 31: Introduction to Big Data Analytics on Apache Hadoop
Page 32: Introduction to Big Data Analytics on Apache Hadoop
Page 33: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

Page 34: Introduction to Big Data Analytics on Apache Hadoop
Page 35: Introduction to Big Data Analytics on Apache Hadoop

EDW

OLAP

ODS

Page 36: Introduction to Big Data Analytics on Apache Hadoop
Page 37: Introduction to Big Data Analytics on Apache Hadoop
Page 38: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Page 39: Introduction to Big Data Analytics on Apache Hadoop
Page 40: Introduction to Big Data Analytics on Apache Hadoop
Page 41: Introduction to Big Data Analytics on Apache Hadoop

Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute

machine to super fast data delivery

Processing EC2

Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime

Processing Libraries Java based or any other language supported through Hadoop Streaming

.Net based code User uploads their code processing binaries/ libraries

Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source

Visualization Custom Custom 3rd party application can connect to storage to perform visualization

Page 42: Introduction to Big Data Analytics on Apache Hadoop

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Page 43: Introduction to Big Data Analytics on Apache Hadoop
Page 44: Introduction to Big Data Analytics on Apache Hadoop
Page 45: Introduction to Big Data Analytics on Apache Hadoop

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx