2013 nov 20 toronto hadoop user group (thug) - hadoop 2.2.0
DESCRIPTION
Our Hadoop 2.2.0 Overview for the Toronto Hadoop User Group. Go THUG life.TRANSCRIPT
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hadoop 2.2.0 Hadoop grows up
Page 1
Adam Muise
© Hortonworks Inc. 2013. Confidential and Proprietary.
Rob Ford says…
Page 2
…turn off your #*@!#%!!! Mobile Phones!
© Hortonworks Inc. 2013. Confidential and Proprietary.
YARN Yet Another Resource Negotiator
© Hortonworks Inc. 2013. Confidential and Proprietary.
A new abstraction layer
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
HDFS2 (redundant, reliable storage)
YARN (cluster resource management)
MapReduce (data processing)
Others (data processing)
HADOOP 2.0
Single Use System Batch Apps
Multi Purpose Platform Batch, Interactive, Online, Streaming, …
Page 4
© Hortonworks Inc. 2013. Confidential and Proprietary.
Concepts
• Application – Application is a job submitted to the framework – Example – Map Reduce Job
• Container – Basic unit of allocation – Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
– Replaces the fixed map/reduce slots
5
© Hortonworks Inc. 2013. Confidential and Proprietary.
YARN Architecture
• Resource Manager – Global resource scheduler – Hierarchical queues
• Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring
• Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master
6
© Hortonworks Inc. 2012
NodeManager NodeManager NodeManager NodeManager
Container 1.1
Container 2.4
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.2
Container 1.3
AM 1
Container 2.2
Container 2.1
Container 2.3
AM2
YARN Architecture - Walkthrough
Client2
ResourceManager
Scheduler
© Hortonworks Inc. 2012
NodeManager NodeManager NodeManager NodeManager
map 1.1
vertex1.2.2
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
map1.2
reduce1.1
Batch
vertex1.1.1
vertex1.1.2
vertex1.2.1
InteracFve SQL
YARN as OS for Data Lake ResourceManager
Scheduler
Real-‐Time
nimbus0
nimbus1
nimbus2
© Hortonworks Inc. 2012
Multi-Tenant YARN ResourceManager
Scheduler root
Adhoc 10%
DW 60%
Mrkting 30%
Dev 10%
Reserved 20%
Prod 70%
Prod 80%
Dev 20%
P0 70%
P1 30%
© Hortonworks Inc. 2013. Confidential and Proprietary.
Multi-Tenancy with New Capacity Scheduler
• Queues • Economics as queue-capacity
– Heirarchical Queues • SLAs
– Preemption • Resource Isolation
– Linux: cgroups – MS Windows: Job Control – Roadmap: Virtualization (Xen, KVM)
• Administration – Queue ACLs – Run-time re-configuration for queues – Charge-back
Page 10
ResourceManager
Scheduler
root
Adhoc 10%
DW 70%
Mrkting 20%
Dev 10%
Reserved 20%
Prod 70%
Prod 80%
Dev 20%
P0 70%
P1 30%
Capacity Scheduler
Hierarchical Queues
© Hortonworks Inc. 2013. Confidential and Proprietary.
MapReduce v2 Changes to MapReduce on YARN
© Hortonworks Inc. 2013. Confidential and Proprietary.
MapReduce V2 is a library now… • MapReduce runs on YARN like all other Hadoop 2.x applications
– Gone are the map and reduce slots, that’s up to containers in YARN now – Gone is the JobTracker, replaced by the YARN AppMaster library
• Multiple versions of MapReduce – The older mapred APIs work without modification or recompilation – The newer mapreduce APIs may need to be recompiled
• Still has one master server component: the Job History Server – The Job History Server stores the execution of jobs – Used to audit prior execution of jobs – Will also be used by YARN framework to store charge backs at that level
Page 12
© Hortonworks Inc. 2013. Confidential and Proprietary.
Shuffle in MapReduce v2 • Faster Shuffle
– Better embedded server: Netty • Encrypted Shuffle
– Secure the shuffle phase as data moves across the cluster – Requires 2 way HTTPS, certificates on both sides – Incurs significant CPU overhead, reserve 1 core for this work – Certs stored on each node (provision with the cluster), refreshed every 10secs
• Pluggable Shuffle Sort – Shuffle is the first phase in MapReduce that is guaranteed to not be data-local – Pluggable Shuffle/Sort allows for intrepid application developers or hardware
developers to intercept the network-heavy workload and optimize it – Typical implementations have hardware components like fast networks and
software components like sorting algorithms – API will change with future versions of Hadoop
Page 13
© Hortonworks Inc. 2013. Confidential and Proprietary.
Efficiency Gains of MRv2
• Key Optimizations – No hard segmentation of resource into map and reduce slots – Yarn scheduler is more efficient – MRv2 framework has become more efficient than MRv1; shuffle phase in MRv2 is
more performant with the usage of netty.
• Yahoo has over 30000 nodes running YARN across over 365PB of data.
• They calculate running about 400,000 jobs per day for about 10 million hours of compute time.
• They also have estimated a 60% – 150% improvement on node usage per day.
• Yahoo got rid of a whole colo (10,000 node datacenter) because of their increased utilization.
© Hortonworks Inc. 2013. Confidential and Proprietary.
HDFS v2 In a NutShell
© Hortonworks Inc. 2013. Confidential and Proprietary.
HA
Page 16
© Hortonworks Inc. 2013. Confidential and Proprietary.
HDFS Snapshots: Feature Overview
• Admin can create point in time snapshots of HDFS
– Of the entire file system (/root)
– Of a specific data-set (sub-tree directory of file system)
• Restore state of entire file system or data-set to a snapshot (like Apple
Time Machine)
– Protect against user errors
• Snapshot diffs identify changes made to data set
– Keep track of how raw or derived/analytical data changes over time
Page 17
© Hortonworks Inc. 2013. Confidential and Proprietary.
NFS Gateway: Feature Overview
• NFS v3 standard
• Supports all HDFS commands
– List files
– Copy, move files
– Create and delete directories
• Ingest for large scale analytical workloads
– Load immutable files as source for analytical processing
– No random writes
• Stream files into HDFS
– Log ingest by applications writing directly to HDFS client mount
© Hortonworks Inc. 2013. Confidential and Proprietary.
Federation
Page 19
© Hortonworks Inc. 2013. Confidential and Proprietary.
Managing Namespaces
Page 20
© Hortonworks Inc. 2013. Confidential and Proprietary.
Performance
Page 21
© Hortonworks Inc. 2013. Confidential and Proprietary.
Other Features
Page 22
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Tez A New Hadoop Data Processing Framework
Page 23
© Hortonworks Inc. 2013. Confidential and Proprietary.
Moving Hadoop Beyond MapReduce • Low level data-processing execution engine • Built on YARN
• Enables pipelining of jobs • Removes task and job launch times • Does not write intermediate output to HDFS
– Much lighter disk and network usage
• New base of MapReduce, Hive, Pig, Cascading etc. • Hive and Pig jobs no longer need to move to the end of the queue
between steps in the pipeline
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Tez as the new Primitive
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Pig (data flow)
Hive (sql)
Others (cascading)
HDFS2 (redundant, reliable storage)
YARN (cluster resource management)
Tez (execu:on engine)
HADOOP 2.0
Data Flow Pig
SQL Hive
Others (cascading)
Batch MapReduce Real Time
Stream Processing
Storm
Online Data
Processing HBase,
Accumulo
MapReduce as Base Apache Tez as Base
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x
ORDER BY AVG;
SELECT a.state
JOIN (a, c) SELECT c.price
SELECT b.id
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state, c.itemId
JOIN (a, c)
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
SELECT b.id
Tez avoids unneeded writes to
HDFS
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Tez (“Speed”) • Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft
YARN ApplicationMaster to run DAG of Tez Tasks
Task with pluggable Input, Processor and Output
Tez Task - <Input, Processor, Output>
Task
Processor Input Output
© Hortonworks Inc. 2013. Confidential and Proprietary.
Tez: Building blocks for scalable data processing
Classical ‘Map’ Classical ‘Reduce’
Intermediate ‘Reduce’ for Map-Reduce-Reduce
Map Processor
HDFS Input
Sorted Output
Reduce Processor
Shuffle Input
HDFS Output
Reduce Processor
Shuffle Input
Sorted Output
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hive
29
© Hortonworks Inc. 2013. Confidential and Proprietary.
SQL: Enhancing SQL Semantics
Hive SQL Datatypes Hive SQL SemanFcs INT SELECT, INSERT
TINYINT/SMALLINT/BIGINT GROUP BY, ORDER BY, SORT BY
BOOLEAN JOIN on explicit join key
FLOAT Inner, outer, cross and semi joins
DOUBLE Sub-‐queries in FROM clause
STRING ROLLUP and CUBE
TIMESTAMP UNION
BINARY Windowing Func:ons (OVER, RANK, etc)
DECIMAL Custom Java UDFs
ARRAY, MAP, STRUCT, UNION Standard Aggrega:on (SUM, AVG, etc.)
DATE Advanced UDFs (ngram, Xpath, URL)
VARCHAR Sub-‐queries in WHERE, HAVING
CHAR Expanded JOIN Syntax
SQL Compliant Security (GRANT, etc.)
INSERT/UPDATE/DELETE (ACID)
Hive 0.12
Available
Roadmap
SQL Compliance Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop
© Hortonworks Inc. 2013. Confidential and Proprietary.
SPEED: Increasing Hive Performance
Performance Improvements included in Hive 12 – Base & advanced query optimization – Startup time improvement – Join optimizations
Interactive Query Times across ALL use cases • Simple and advanced queries in seconds • Integrates seamlessly with existing tools • Currently a >100x improvement in just nine months
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Tez as the new Primitive
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Pig (data flow)
Hive (sql)
Others (cascading)
HDFS2 (redundant, reliable storage)
YARN (cluster resource management)
Tez (execu:on engine)
HADOOP 2.0
Data Flow Pig
SQL Hive
Others (cascading)
Batch MapReduce Real Time
Stream Processing
Storm
Online Data
Processing HBase,
Accumulo
MapReduce as Base Apache Tez as Base
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x
ORDER BY AVG;
SELECT a.state
JOIN (a, c) SELECT c.price
SELECT b.id
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state, c.itemId
JOIN (a, c)
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
SELECT b.id
Tez avoids unneeded writes to
HDFS
© Hortonworks Inc. 2012
NodeManager NodeManager NodeManager NodeManager
map 1.1 vertex1.2.2
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
map1.2
reduce1.1
Batch
vertex1.1.1
vertex1.1.2
vertex1.2.1
Hive/Tez (SQL)
Tez on YARN ResourceManager
Scheduler
Real-‐Time
nimbus0
nimbus1
nimbus2
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Falcon Data Lifecycle Management for Hadoop
© Hortonworks Inc. 2013. Confidential and Proprietary.
Data Lifecycle on Hadoop is Challenging
Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs
Problem: Patchwork of tools complicate data lifecycle management. Result: Long development cycles and quality challenges.
© Hortonworks Inc. 2013. Confidential and Proprietary.
Falcon: One-stop Shop for Data Lifecycle
Apache Falcon Provides Orchestrates
Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs
Falcon provides a single interface to orchestrate data lifecycle. Sophisticated DLM easily added to Hadoop applications.
© Hortonworks Inc. 2013. Confidential and Proprietary.
Falcon Core Capabilities • Core Functionality
– Pipeline processing – Replication – Retention – Late data handling
• Automates – Scheduling and retry – Recording audit, lineage and metrics
• Operations and Management – Monitoring, management, metering – Alerts and notifications – Multi Cluster Federation
• CLI and REST API
© Hortonworks Inc. 2013. Confidential and Proprietary.
Falcon At A Glance
> Falcon offers a high-level abstraction of key services for Hadoop data management needs. > Complex data processing logic is handled by Falcon instead of hard-coded in data processing apps. > Falcon enables faster development of ETL, reporting and other data processing apps on Hadoop.
Data Processing Applications
Data Import and
Replication
Scheduling and
Coordination
Data Lifecycle Policies
Multi-Cluster Management
SLA Management
Falcon Data Management Framework
© Hortonworks Inc. 2013. Confidential and Proprietary.
> Falcon manages workflow and replication. > Enables business continuity without requiring full data representation. > Failover clusters can be smaller than primary clusters.
Falcon Example: Replication
Staged Data
Staged Data
Cleansed Data
Access Data
Processed Data
Conformed Data
Rep
licat
ion
Rep
licat
ion
© Hortonworks Inc. 2013. Confidential and Proprietary.
> Sophisticated retention policies expressed in one place. > Simplify data retention for audit, compliance, or for data re-processing.
Falcon Example: Retention
Staged Data
Retain 20 Years
Cleansed Data
Retain 3 Years
Conformed Data
Retain 3 Years
Access Data
Retain Last Copy Only
© Hortonworks Inc. 2013. Confidential and Proprietary.
Falcon Example: Late Data Handling
> Processing waits until all required input data is available. > Checks for late data arrivals, issues retrigger processing as necessary. > Eliminates writing complex data handling rules within applications.
Online Transaction
Data (via Sqoop)
Web Log Data (via FTP)
Staged Data Combined Dataset
Wait up to 4 hours for FTP data
to arrive
© Hortonworks Inc. 2013. Confidential and Proprietary.
Examples
Page 43
© Hortonworks Inc. 2013. Confidential and Proprietary.
Example: Cluster Specification
<?xml version="1.0"?>!<!--!
My Local Cluster specification! -->!
<cluster colo=”my-local-cluster" description="" name="cluster-alpha"> ! <interfaces>!
<interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" />! <interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" />!
<interface type="execute" endpoint=”rm:8050" version="2.2.0" />! <interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" />!
<interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" />! </interfaces>!
<locations>! <location name="staging" path="/apps/falcon/cluster-alpha/staging" />!
<location name="temp" path="/tmp" />! <location name="working" path="/apps/falcon/cluster-alpha/working" />!
</locations>!</cluster>!
Page 44
NameNode
Resource Manager
Oozie Server
readonly!
write!
execute!
workflow!
© Hortonworks Inc. 2013. Confidential and Proprietary.
Example: Weblogs Replication and Retention
Page 45
© Hortonworks Inc. 2013. Confidential and Proprietary.
Example 1: Weblogs • Weblogs land hourly in my primary cluster • HDFS location is /weblogs/{date}
• I want to: – Evict weblogs from primary cluster after 1 day
Page 46
© Hortonworks Inc. 2013. Confidential and Proprietary.
Feed Specification 1: Weblogs
Page 47
<feed description="" name="feed-weblogs1" xmlns="uri:falcon:feed:0.1” >! <frequency>hours(1)</frequency>! ! <clusters>!
!<cluster name="cluster-primary" type="source”>!! <validity start="2013-10-24T00:00Z" end="2014-12-31T00:00Z"/>!! <retention limit="days(1)" action="delete"/>!!</cluster>!
</clusters>!! <locations>!
!<location type="data" path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR}" />! </locations>!! <ACL owner="hdfs" group="users" permission="0755" />! <schema location="/none" provider="none"/>!</feed>!
Location of the data
Cluster where data is located
Retention policy 1 day
© Hortonworks Inc. 2013. Confidential and Proprietary.
Example 2: Weblogs • Weblogs land hourly in my primary cluster • HDFS location is /weblogs/{date}
• I want to: – Replicate weblogs to my secondary cluster – Evict weblogs from primary cluster after 2 days – Evict weblogs from secondary cluster after 1 week
Page 48
© Hortonworks Inc. 2013. Confidential and Proprietary.
Feed Specification 2: Weblogs
<feed description=“" name=”feed-weblogs2” xmlns="uri:falcon:feed:0.1">! <frequency>hours(1)</frequency>!! <clusters>!
<cluster name=”cluster-primary" type="source">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!
<retention limit="days(2)" action="delete"/>! </cluster>! <cluster name=”cluster-secondary" type="target">!
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>! <retention limit=”days(7)" action="delete"/>!
</cluster>! </clusters>!!
<locations>! <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!
! <ACL owner=”hdfs" group="users" permission="0755"/>!
<schema location="/none" provider="none"/>!</feed>!
Location of the data
Cluster where data is located
Retention policy 2 days
Cluster where data will be replicated
Retention policy 1 week
© Hortonworks Inc. 2013. Confidential and Proprietary.
Example 3: Weblogs • Weblogs land hourly in my primary cluster • HDFS location is /weblogs/{date}
• I want to: – Replicate weblogs to a discovery cluster – Replicate weblogs to a BCP cluster – Evict weblogs from primary cluster after 2 days – Evict weblogs from discovery cluster after 1 week – Evict weblogs from BCP cluster after 3 months
Page 50
© Hortonworks Inc. 2013. Confidential and Proprietary.
Feed Specification 3: Weblogs <feed description=“” name=”feed-weblogs” xmlns="uri:falcon:feed:0.1">! <frequency>hours(1)</frequency>!
! <clusters>!
<cluster name=”cluster-primary" type="source">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!
<retention limit="days(2)" action="delete"/>! </cluster>!
<cluster name=“cluster-discovery" type="target">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!
<retention limit=”days(7)" action="delete"/>! <locations>!
<location type="data” path="/projects/recommendations/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!
</cluster>! <cluster name=”cluster-bcp" type="target">!
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>! <retention limit=”months(3)" action="delete"/>!
<locations>! <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>!
</locations>!
</cluster>! </clusters>!
! <locations>!
<location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!
! <ACL owner=”hdfs" group="users" permission="0755"/>!
<schema location="/none" provider="none"/>!</feed>!
Cluster specific location
Cluster specific location
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Knox Secure Access to Hadoop
© Hortonworks Inc. 2013. Confidential and Proprietary.
Connecting to the Cluster..Edge Nodes • What is an Edge Node?
– Nodes in a DMZ zone that has access to the cluster. Only way to access the cluster
– Hadoop client Apis and MR/Pig/Hive jobs would be executed from these edge nodes.
– Users SSH to Edge Node and upload all job artifacts and then execute API/Commands commands from shell
Page 53
Hadoop User Edge Node
SSH!
• Challenges – SSH, Edge Node, and job maintenance nightmare – Difficult to integrate with Applications
© Hortonworks Inc. 2013. Confidential and Proprietary.
Connecting to the Cluster..REST API
• Useful for connecting to Hadoop from the outside the cluster • When more client language flexibility is required
– i.e. Java binding not an option
• Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster
Page 54
Service API WebHDFS Supports HDFS user operations including reading files,
writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS.
WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat.
Oozie Job submission and management, and Oozie administration. Learn more about Oozie.
© Hortonworks Inc. 2013. Confidential and Proprietary.
Apache Knox Gateway – Perimeter Security
Page 55
Simplified Access
• Single Hadoop access point • Rationalized REST API hierarchy • Consolidated API calls • Multi-cluster support • Client DSL
Centralized Security
• Eliminate SSH “edge node” • LDAP and ActiveDirectory auth • Central API management + audit
© Hortonworks Inc. 2013. Confidential and Proprietary.
Knox Gateway Network Architecture
Page 56
Ambari Server/Hue Server
Kerberos/Enterprise
Identity Provider
Enterprise/Cloud SSO
Provider
Identity Providers
Knox Gateway Cluster
GW GW GW
DMZ
A stateless cluster of reverse proxy instances deployed in DMZ
Firewall
Secure Hadoop Cluster 1
Masters
JT NN Web HCat Oozie
YARN HBase Hive
DN TT
Secure Hadoop Cluster 2
Masters
JT NN Web HCat Oozie
YARN HBase Hive
DN TT -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway
Firewall
REST Client
JDBC Client
Ambari Client
Browser
© Hortonworks Inc. 2013. Confidential and Proprietary.
Wot no 2.2.0? Where can I get the Hadoop 2.2.0 fix?
Page 57
© Hortonworks Inc. 2013. Confidential and Proprietary.
Like the Truth, Hadoop 2.2.0 is out there…
Page 58
Component HDP2.0 CDH4 CDH5 Beta
Intel IDH3.0
MapR 3 IBM Big Insights 2.1
Hadoop Common
2.2.0 2.0.0 2.2.0 2.0.4 N/A 1.1.1
Hive + HCatalog
0.12 0.10 + 0.5
0.11 0.10 + 0.5 0.11 0.9 + 0.4
Pig 0.12 0.11 0.11 0.10 0.11 0.10
Mahout 0.8 0.7 0.8 0.8 0.8 N/A
Flume 1.4.0 1.4.0 1.4.0 1.3.0 1.4.0 1.3.0
Oozie 4.0.0 3.3.2 4.0.0 3.3.0 3.3.2 3.2.0
Sqoop 1.4.4 1.4.3 1.4.4 1.4.3 1.4.4 1.4.2
HBase 0.96.0 0.94.6 95.2 0.94.7 94.9 0.94.3
© Hortonworks Inc. 2013. Confidential and Proprietary.
Thank You THUG Life