cloudian 451-hortonworks - webinar
TRANSCRIPT
2
Webinar Logistics
● Be on the look-out for polling questions ● You may ask questions at any time during the presentation by using the
Q&A box ● ON-Demand Viewers please tweet us questions @cloudianstorage ● At the end of the presentation please provide feedback and rate us
451 Research is an information technology research & advisory company Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+ innovative technology & service providers
Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
3 Copyright (C) 2015 451 Research LLC
4
Our Speakers
4
Paul Turner leads marketing, product planning and strategy at Cloudian. A storage industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office, guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23 years of development and management leadership, including 15 years at Oracle.
Matt Aslet, Research Director for the data platforms and analytics research channel, has overall responsibility for the coverage of operational and analytic databases, data integration, data quality, and business intelligence. Matt's own primary area of focus is on relational and non-relational databases - including NoSQL and NewSQL - data warehousing, data caching, and Hadoop. Matthew is also an expert in open source software and regularly contributes to 451 Research's open source-related research.
John Kreisa A veteran from the enterprise marketing industry, John has worked on products at every level of the IT stack from the depths of storage through to the insight of business intelligence and analytics. Currently John leads partner and strategic marketing initiatives at open source leader Hortonworks who develops, distributes and supports Apache Hadoop.
• Apache Hadoop • Object storage • NoSQL • Steam processing • Predictive analytics • Data wrangling
Big data: cause and effect
5 Copyright (C) 2015 451 Research LLC
CAUSE?
• Apache Hadoop • Object storage • NoSQL • Steam processing • Predictive analytics • Data wrangling
Big data: cause and effect
• Volume • Velocity • Variety
EFFECT
6 Copyright (C) 2015 451 Research LLC
CAUSE?
• Apache Hadoop • Object storage • NoSQL • Steam processing • Predictive analytics • Data wrangling
Big data: cause and effect
• Volume • Velocity • Variety
EFFECT EFFECTED CAUSE
7 Copyright (C) 2015 451 Research LLC
• Apache Hadoop • Object storage • NoSQL • Steam processing • Predictive analytics • Data wrangling
Big data: cause and effect
• Volume • Velocity • Variety
Economics: • Commodity hardware • Open source software
EFFECT EFFECTED CAUSE
8 Copyright (C) 2015 451 Research LLC
Big data is driven by economics
9
“Big data is what happened when the cost of keeping informa5on became less than the cost of throwing it away.” – George Dyson
“Big data: New business insights based on storing, processing and analyzing data that was previously ignored due to the cost and func5onal limita5ons of tradi5onal data management technologies.” – 451 Research
Copyright (C) 2015 451 Research LLC
Big data is driven by economics
10 Copyright (C) 2015 451 Research LLC
What happened when the cost of keeping informa5on became less than the cost of throwing it away?
Big data is driven by economics
11
What happened when the cost of keeping informa5on became less than the cost of throwing it away? • The processing and analysis of very large data sets in their en5rety • Increased adop5on of massively parallel processing approaches • Storage and analysis of both structured and mul5-‐structured data • Integra5on of external (social) and corporate data for more complete perspec5ve • Schema-‐free and schema-‐on-‐read approaches to data storage/analysis • Adop5on of exploratory analy5c approaches to iden5fy new paSerns in data • Predic5ve analy5cs as a fundamental component of BI strategies • Machine-‐learning algorithms automate the reflec5on of collec5ve intelligence • Increased adop5on of in-‐memory databases for rapid data inges5on • Real-‐5me analysis of data prior to storage within the data warehouse/Hadoop • Interac5ve, na5ve, SQL-‐based analysis of data in Hadoop and HBase • Large-‐scale processing of sensor and other machine-‐generated data/events
Copyright (C) 2015 451 Research LLC
• Apache Hadoop • Object storage • NoSQL • Steam processing • Predictive analytics • Data wrangling
Big data: cause and effect
• Volume • Velocity • Variety
Economics: • Commodity hardware • Open source software
EFFECT EFFECTED CAUSE
12
IoT
Copyright (C) 2015 451 Research LLC
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional Analytic Systems Under Pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture Emerges to Unify Analytics & Data Processing
Modern Data Analytics Architecture • Enable applications to have access to
all your enterprise data through an efficient centralized platform
• Supported with a centralized approach analytics, governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP
EDW
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the Data Lake for Analytics SC
ALE
SCOPE
Data Lake Definition • Centralized Architecture
Multiple applications on a shared data set with consistent levels of service
• Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.
Drivers: 1. Cost Optimization 2. Advanced Analytic Apps
Goal: • Centralized Architecture • Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
16
Your Data at Webscale Economics
16
HyperStore: SoZware Defined Storage
REPLICATION (RF=1,2,3,4)
ERASURE CODING (N+1,2,3,4) COMPRESSION
(Zlib,lz4)
Commodity Servers Scale Out Durable Simple to Use
CPU Disks Network
Heterogeneous Node
100TB
300TB
17
Smart Data
17
Consumer Activity (Events, GPS, WiFi) �
Social Media Device Tracking and Logs
Cloudian HyperStore
INTERNET OF THINGS
BIG DATA Event processing
plaMorm ü Analyze more – allows for efficient bulk
data analysis in place
ü Faster time-to-decision
ü HyperStore scales out with your data – adding nodes for I/O
Analytics
Result of Analysis �
19
Interoperability : Cloudian & Hortonworks
19
YARN : Data Operating System
Script
Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-Memory Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
Batch
Map Reduce
Linux Windows On-Premise Cloud
HDFS S3 Native File System (URI scheme: s3n)
20
Use Cases
20
Hadoop for Internet of Things
Clickstream data Sentiment data Server log data Sensor data Analysis of what people click on – Individual web pages and in what order. Clickstream analysis can reveal how users research products and also how they complete their online purchases. ü Internet Marketing ü Online Commerce
Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support interactions. Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time. ü Retail ü Media & Entertainment
Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computer-generated records that report data on the operations of those networks. When there is a problem, its one of the first places the IT team looks for a diagnosis.
ü IT Organizations ü Customer Support
From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age. ü Manufacturing ü Industrial
Thank You! Matt Aslett [email protected] www.451research.com @maslett
Paul Turner [email protected] www.cloudian.com @CloudianStorage John Kreisa [email protected] www.hortonworks.com @Hortonworks