hdp-1 introduction for hug france
DESCRIPTION
Presentation on Hortonworks Data Platform for HUG France, June 28, 2012TRANSCRIPT
© Hortonworks Inc. 2012
HDP-1Steve Loughran– Hortonworksstevel at hortonworks.com@steveloughran
Paris, June 2012
© Hortonworks Inc. 2012
Hortonworks Data PlatformHortonworks Data Platform
Operate Integrate
Develop Interact
Distributed Storage(HDFS)
Distributed Processing(MapReduce)
Query(Hive)
Query(Hive)
Scripting(Pig)
Scripting(Pig)
Metadata Management(HCatalog)
Metadata Management(HCatalog)
Non
-Rel
atio
nal D
atab
ase
(HB
ase)
Non
-Rel
atio
nal D
atab
ase
(HB
ase)
Page 2
Hortonworks Data Platform
Dat
a E
xtra
ctio
n &
Loa
ding
(HC
atal
og A
PIs
, W
ebH
DF
S,
Tal
end
Ope
n S
tudi
o fo
r B
ig D
ata,
Sqo
op)
Dat
a E
xtra
ctio
n &
Loa
ding
(HC
atal
og A
PIs
, W
ebH
DF
S,
Tal
end
Ope
n S
tudi
o fo
r B
ig D
ata,
Sqo
op)
Man
agem
ent &
Mon
itorin
g(A
mba
ri, Z
ooke
eper
)
Wor
kflo
w &
Sch
edul
ing
(Ooz
ie)
Wor
kflo
w &
Sch
edul
ing
(Ooz
ie)
© Hortonworks Inc. 2012
AmbariHCatalog HivePigHadoopCore
Zookeeper
Challenge: Integrate, manage, and support changes across a wide range of open source projects that power the Hadoop platform; each with their own release schedules, versions, & dependencies. Time-intensive, Complex, Expensive
Solution: Hortonworks Data Platform Integrated certified platform distributions
Extensive Q/A process: many apps across small, medium, & large clusters
Industry-leading Support with clear service levels for updates and patches
Hortonworks Data Platform (HDP)Fully Integrated, Extensively Tested, Enterprise Supported
Page 3
= New Version
© Hortonworks Inc. 2012
HDP 1.0 Components
Page 4
Component Version
Apache Hadoop (HDFS & MapReduce) 1.0.3+
Apache HCatalog 0.4.0+
Apache Pig 0.9.2
Apache Hive 0.9.0+
Apache HBase 0.92.1+
Talend Open Studio for Big Data 5.1.0
Apache Sqoop 1.4.1+
Apache Oozie 3.1.3+
Apache Zookeeper 3.3.4
Apache Ambari0.1
(Technology Preview)
© Hortonworks Inc. 2012
• 100% Open Source
• Wizard-based install, provisioning & configuration management
• Monitoring and alerting dashboards
• Goals: ease of installation, scale to large clusters, effective monitoring of all services
Management & Monitoring: Ambari
Page 5
© Hortonworks Inc. 2012
Cluster Provisioning through Web UI
Page 6
Download and try from http://hortonworks.com
© Hortonworks Inc. 2012
Monitoring and alerting dashboards
Page 7
© Hortonworks Inc. 2012
Installation and Provisioning
HMC Installer -GUI, puppet-driven– Installs Java and up;–Configures entire cluster–Sets up HMC for cluster monitoring–Web UI + text files listing nodes
gsInstall–Command line installer -file driven
RPM/YUM for custom installation processes–Configuration left as an exercise–Use if you have other cluster management tooling
Page 8
Qualified at scale on RHEL5.8 & Java 6u26
© Hortonworks Inc. 2012
Enterprise Data Integration -> Talend
• Talend Open Studio for Big Data– Feature-rich Job Designer– Rich palette of pre-built templates– Supports HDFS, Pig, Hive, HBase, HCatalog– Apache-licensed, bundled with HDP
• Key benefits– Graphical development– Robust and scalable execution– Broadest connectivity to support
all systems:450+ components
– Real-time debugging
Page 9
© Hortonworks Inc. 2012
Metadata Management -> HCatalog
• Simplifies data sharing between Hadoop and other data systems– Enables Hadoop data to be described in a schema & accessed as tables
• Provides consistent data access for MapReduce, Hive and Pig– Minimizes hard coding of data structure, storage format, and location
• Manages metadata for table storage– Based on Hive’s metadata server– Uses Hive language for metadata manipulation operations
• Tables may be stored in RCFile, Text files, or SequenceFiles
Page 10
© Hortonworks Inc. 2012
RESTful API Front-door for Hadoop
• Opens the door to languages other than Java• Thin clients via web services vs. fat-clients in gateway • Insulation from interface changes release to release
Page 11
Web
HDFS
HDFSHDFS HBaseHBase
HCatalogHCatalog
External Store
External Store
MapReduceMapReduce PigPig HiveHive
HCatalog web interfaces
© Hortonworks Inc. 2012
WebHDFS: HDFS over HTTP
~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=open
GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,...
Page 12
Potential Uses:
Out of cluster access to HDFS
Cross-cluster, cross version HDFS access
Native filesystem clients
dfs.webhdfs.enabled=true
© Hortonworks Inc. 2012
The Web HDFS & service APIs isolate Hadoop internals fromstable public interfaces
Long-haul, cross-language, stable, secure
Page 13
© Hortonworks Inc. 2012
My project: HA on vSphere
Page 14
© Hortonworks Inc. 2012
Release Schedule
HDP 1.x : quarterly releases–Large-scale QA process–Validate performance as well as functionality
Technology Preview Program–Early access; help w/ testing–Access to new features such as–HA–Windows Integration
Predictable timetable of stable releases
Page 15
© Hortonworks Inc. 2012
Ready and free to use today:
http://hortonworks.com/download/
Page 16
© Hortonworks Inc. 2012
Thank You!Des questions?
Page 17