mrinal devadas, hortonworks making sense of big data
TRANSCRIPT
© Hortonworks Inc. 2013
HortonworksCommunity DrivenEnterprise Apache Hadoop
Mrinal Devadas
Systems Architect
Page 1
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our Approach• Patterns of Use
Page 2
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 3
2013
Focus on INNOVATION2005: Yahoo! creates
team under E14 to work on Hadoop
Focus on OPERATIONS2008: Yahoo team extends focus to
operations to support multiple projects & growing clusters
Yahoo! begins to Operate at scale
EnterpriseHadoop
Apache Project Established
HortonworksData Platform
2004 2008 2010 20122006
STABILITY2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with
24 key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 4
• We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform
• We engineer, test & certify HDP for enterprise usage
• We employ the core architects, builders and operators of Apache Hadoop
• We drive innovation within Apache Software Foundation projects
• We are uniquely positioned to deliver the highest quality of Hadoop support
• We enable the ecosystem to work better with Hadoop
Develop Distribute Support
We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our approach
– Leading Open Source Hadoop innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source
• Patterns of Use
Page 5
© Hortonworks Inc. 2013Page 6
Apache Software Foundation Guiding Principles• Release early & often• Transparency, respect, meritocracy
Key Roles held by Hortonworkers• PMC Members
– Managing community projects– Mentoring new incubator projects– Over 20 Hortonworkers managing community
• Committers– Authoring, reviewing & editing code– Over 50 Hortonworkers across projects
• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,
Hive, Pig, HCatalog, Ambari, HBase
ApacheHadoop
Test &Patch
Design & Develop
Release
ApachePig
ApacheHCatalo
gApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
© Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 7
• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006– More than twice nearest contributor
• Deeply integrating w/ecosystem– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions– (ex. Teradata big data appliance)
• All Apache, NO holdbacks– 100% of code contributed to Apache
© Hortonworks Inc. 2013
Driving Enterprise Hadoop Innovation
Page 8
HortonworksCommitters
Cloudera Committers
19 8
6 1
5 0
5 9
16 0AMBARI
HBASE
HIVE/HCATALOG
PIG
HADOOP CORE
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Lines Of Code By CompanySource: Apache Software Fundation
Hortonworks Yahoo! Cloudera Other
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 9
Upstream Community Projects Downstream Enterprise Product
HortonworksData Platform
Design & Develop
Distribute
Integrate & Test
Package & Certify
ApacheHCatalo
g
ApachePig
ApacheHBase
Other Apache Projects
ApacheHive
Apache Ambari
ApacheHadoop
Test &Patch
Design & Develop
Release
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstreamNo Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Stable Project Releases
Fixed Issues
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our approach
– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring NO LOCK-IN: 100% Open Source
• Patterns of use
Page 10
© Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale storage & processing with enterprise-ready platform services
Unique Focus Areas:• Bigger, faster, more flexible
Continued focus on speed & scale and enabling near-real-time apps
• Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release
• Enterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
Page 11
HADOOP CORE
Hortonworkers are the architects, operators, and builders of core Hadoop
Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013Page 12
HADOOP CORE
DATASERVICES
Provide data services to store, process & access data in many ways
Unique Focus Areas:• Apache HCatalog
Metadata services for consistent table access to Hadoop data
• Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools
Distributed Storage & Processing
Hortonworks enables Hadoop data to be accessed via existing tools & systems
Store, Process and Access Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
© Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 13
OPERATIONAL SERVICES
Include complete operational services for productive operations & management
Unique Focus Area:• Apache Ambari:
Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues
Only Hortonworks provides a complete open source Hadoop management tool
Manage & Operate at
Scale
DATASERVICES
Store, Process and Access Data
HADOOP CORE Distributed Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Page 14
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
Enterprise Readiness
Only Hortonworks allows you to deploy seamlessly across any deployment option
• Linux & Windows• Azure, Rackspace & other clouds• Virtual platforms• Big data appliances
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Deployable Across a Range of Options
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 15
PLATFORM SERVICES
HADOOP CORE
DATASERVICES
OPERATIONAL SERVICES
Manage & Operate at
Scale
Store, Process and Access Data
HORTONWORKS DATA PLATFORM (HDP)
Distributed Storage & Processing
Hortonworks Data Platform (HDP)Enterprise Hadoop
• The ONLY 100% open source and complete distribution
• Enterprise grade, proven and tested at scale
• Ecosystem endorsed to ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our approach
– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-in: 100% Open Source
• Patterns of use
Page 16
© Hortonworks Inc. 2013
Existing Data ArchitectureAP
PLIC
ATIO
NS
DATA
SYS
TEM
S
TRADITIONAL REPOSRDBMS EDW MP
P
DATA
SO
URC
ES
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
Page 17
© Hortonworks Inc. 2013
Next-Generation Data ArchitectureAP
PLIC
ATIO
NS
DATA
SYS
TEM
S
TRADITIONAL REPOSRDBMS EDW MP
P
DATA
SO
URC
ES
OLTP, POS SYSTEMS
OPERATIONALTOOLS
MANAGE & MONITOR
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensors, social media)
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
Enterprise Applications
ENTERPRISE HADOOP PLATFORM
Page 18
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 19
APPL
ICAT
ION
SDA
TA S
YSTE
MS
TRADITIONAL REPOS
DEV & DATATOOLS
OPERATIONALTOOLS
Viewpoint
Microsoft Applications
HORTONWORKS DATA PLATFORM
DATA
SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensors, social media)
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our approach
– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source
• Patterns of use
Page 20
© Hortonworks Inc. 2013
True Enterprise Class Open Source
• Community-driven Approach Mitigates Lock-In– Identify & introduce enterprise requirements into public domain– Work with community to advance & incubate open source projects– Apply Enterprise Rigor for the most stable and reliable distribution
• 100% Open Source. No Holdbacks.– Only true implementation of OSS Apache Hadoop– Preferred by the software vendors that you rely on– Proprietary Open Source = Lock-In– Open communities always trump “open source”
• Flexible Deployment– No License Fee for usage
Page 21
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks• Our approach• Patterns of use
Page 22
© Hortonworks Inc. 2013
Big DataTransactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKSDATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 23
© Hortonworks Inc. 2013
Operational Data RefineryDA
TA S
YSTE
MS
DATA
SO
URC
ES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore Enrich
2
APPL
ICAT
ION
S
Transform & refine ALL sources of data
Also known as Data Reservoir or Catch Basin
TRADITIONAL REPOSRDBMS EDW MPP
Business Analytics
Custom Applications
Enterprise Applications
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Page 24
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Big Data Exploration & VisualizationDA
TA S
YSTE
MS
DATA
SO
URC
ES
Refine Explore Enrich
APPL
ICAT
ION
S
Leverage “data lake” to perform iterative investigation for value
3
2TRADITIONAL REPOS
RDBMS EDW MPP
1
Business Analytics
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
1 Capture
Process
Explore & Visualize
2
3
Page 25
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
DATA
SYS
TEM
SDA
TA S
OU
RCES
Refine Explore Enrich
APPL
ICAT
ION
S
Create intelligent applications
Collect data, create analytical models and deliver to online apps
3
1
2TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
Custom Applications
Enterprise Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 26
Application Enrichment
HORTONWORKS DATA PLATFORM
© Hortonworks Inc. 2013
Flexible Support Subscription Programs
Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage
Page 27
Developer Support“How to” guidance for developers and archs
Essential Support*Operations support for small research clusters
Standard SupportOperations support for dev & test clusters
12 x 5Web only
12 x 5Web only
All Sev: 1 business day
All Sev: 1 business day
12 x 5Web only
ApplicationDesign Advice
Code Review
Cluster Design, Install, Maintain, Performance
Cluster Design, Install, Maintain, Performance
All Sev: 1 business day
1 seat
3 Contacts
3 Contacts
Patches & Updates
Patches & Updates
* Limited in size and no expansion
Enterprise SupportOperations support for critical clusters
24 x 7 Phone &
Web
Sev 1: 1 Hour Sev 2: 4 Bus Hour
Cluster Design, Install, Maintain, Performance
5 Contacts
Patches & Updates
Additional Options
© Hortonworks Inc. 2013
Hortonworks: Best In Class Hadoop Support
• Experienced enterprise support team – Experience supporting enterprise clients in production– Core engineers have real operational
experience: built and supported 44+K nodes in production– Extensive experience in commercial big data offerings
including HDP, MapR, Karmasphere
• Global 24x7 operation – support based in Sunnyvale, UK & India
• Stringent case management processes ensures high quality customer service & responsiveness
Page 28
© Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
The expert source for Apache Hadoop training &
certification
• World class training programs designed to help you learn fast
– Role-based hands on classes with 50% lab time
• Expert consulting services– Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program– Fastest way to learn Apache Hadoop– Multi-level tutorials for wide applicability– Customizable and updateable
Page 29
© Hortonworks Inc. 2013
Introducing Hortonworks Data Platform for Windows
Enterprise Apache Hadoop
March 2013
Page 30
© Hortonworks Inc. 2013
Why Apache Hadoop on Windows?
• According to IDC Windows Server held 73% market share in 2012– Hadoop was traditionally built for Linux servers so there are a large number of underserved
organizations
• According to 2012 Barclays CIO study big data outranks virtualization as #1 trend driving spending initiatives
– Unstructured data growth exceeds 80% year/year in most enterprises
• Apache Hadoop is the defacto big data platform for processing massive amounts of unstructured data
– Complementary to existing Microsoft technologies– There is a huge untapped community of Windows developers and ecosystem partners
• A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step
Page 31
© Hortonworks Inc. 2013
Hortonworks Data Platform for Windows
• Enterprise-grade Apache Hadoop on Windows– Enables same experience for Hadoop on Windows & Linux
• More partners, more developers for Hadoop– Makes native Apache Hadoop available to Windows ecosystem– More options for Windows focused organizations
• Hortonworks focus: Enterprise Apache Hadoop for all platforms– Trusted reliable production-ready distribution for on-premise Hadoop on Windows
deployments
• Built with joint investment and contributions from Microsoft– Deep engineering relationship ensures tight integration and maximum performance
Page 32
HDP is the first and only distribution available on Windows & Linux
© Hortonworks Inc. 2013
Seamless Interoperability with Your Microsoft Tools
• Integrated with Microsoft tools for native big data analysis
– Bi-directional connectors for SQL Server and SQL Azure through SQOOP
– Excel ODBC integration through Hive
• Addressing demand for Hadoop on Windows
– Ideal for Windows customers with Hadoop operational experience
• Enables most common Hadoop workloads in the Enterprise
– Data refinement and ETL offload for high-volume data landing
– Data exploration for discovery of new business opportunities
– Data enrichment for fined tuned delivery and recommendation engines
Page 33
APPL
ICAT
ION
SDA
TA S
YSTE
MS
Microsoft Applications
HORTONWORKS DATA PLATFORMFor Windows
DATA
SO
URC
ES
MOBILEDATA
OLTP, POS SYSTEMS
Traditional Sources (RDBMS, OLTP, OLAP)
New Sources (web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Inside HDP for Windows
Page 34
HORTONWORKS DATA PLATFORM (HDP)For Windows
Hortonworks Data Platform (HDP)For Windows
• 100% Open Source Enterprise Hadoop
• Component and version compatible with HDInsight
• Availability
• Beta release available now
PLATFORM SERVICES
HADOOP CORE Distributed Storage & ProcessingHDFS
WEBHDFS
MAP REDUCE
DATASERVICES
Store, Process and Access Data
HCATALOG
HIVEPIG
SQOOP
OPERATIONAL SERVICES
Manage & Operate at
ScaleOOZIE
© Hortonworks Inc. 2013
Maximize Your Hadoop Deployment Choice
• Use HDP for Windows for on-premises deployment on Windows Server– Ideal for Windows users with Hadoop experience– Perfect next step for those who are ready to move from POC to production
• Use HDInsight for Microsoft tooling and Management and Provisioning– HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –
available in Preview today– HDInsight Server for full integration of Hadoop with Microsoft tools on premises –
Developer Preview available today
• Full interoperability and deployment choice across platforms– Implement big data applications that run on-premise & cloud– By leveraging open source HDP, enables seamless interoperability across
environments: Linux, Windows, Windows Azure
Page 35
© Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop• Addressing the requirements for Enterprise usage• Enabling interoperability of the ecosystem• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more– www.hortonworks.com
– http://hortonworks.com/hadoop-training/
Page 36