combine apache hadoop and elasticsearch to get the most of your big data
DESCRIPTION
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data. In this webinar we'll walk you through: How Elasticsearch fits in the Modern Data Architecture. A demo of Elasticsearch and Hortonworks Data Platform. Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.TRANSCRIPT
© Hortonworks Inc. 2013
Combine Apache Hadoop & Elasticsearch to get the most of your big data...
Page 1
© Hortonworks Inc. 2013
Your Presenters
Page 2
Steve Mayzak (@smayzak) – Head of Sales Engineering – Seahawks fan!
Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan!
© Hortonworks Inc. 2013
Today’s Topics
• Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A
Page 3
© Hortonworks Inc. 2013
Hadoop Adoption
“Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data”
--Mike Gualtieri, Forrester
Page 4
© Hortonworks Inc. 2013
A Traditional Approach Under Pressure
Page 5
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
© Hortonworks Inc. 2013
Emerging Modern Data Architecture
Page 6
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
© Hortonworks Inc. 2013
MDA Driver #1: A New Approach to Insight
HADOOP Iterate over structure
Transform and Analyze
Hadoop Approach § Apply schema on read § Support range of access patterns to
data stored in HDFS: polymorphic access
batch interactive real-time
Right Engine, Right Job
in-memory
Page 7
Current Approach § Apply schema on write § Heavily dependent on IT
Determine list of questions
Design solution
Collect structured data
Ask questions from list
Detect additional questions
Single Query Engine SQL
© Hortonworks Inc. 2013
Operations 50%
Analytics 50%
HADOOP Parse, cleanse,
apply structure, transform
Augment with Hadoop § Free up EDW resources from low
value tasks § Keep 100% of source data and
historical data for ongoing exploration § Mine data for value after loading it
because of schema-on-read
MDA Driver #2: Data Warehouse Optimization
Analytics 20%
ETL Process 30%
Operations 50%
Current Reality § EDW at capacity; some usage
from low value workloads § Older transformed data
archived, unavailable for ongoing exploration
§ Source data often discarded
Page 8
© Hortonworks Inc. 2013
Cost, Insight IT Driven
MDA/Data Lake
The Common Journey with Hadoop SC
ALE
SCOPE Page 9
More data and analytic apps
New Analytic Apps New Types of Data LOB Driven
© Hortonworks Inc. 2013
Unlock Value in New Types of Data 1. Social
Understand how people are feeling and interacting – right now
2. Clickstream Capture and analyze website visitors’ data trails and optimize your website
3. Sensor/Machine Discover patterns in data streaming from remote sensors and machines
4. Geographic Analyze location-based data to manage operations where they occur
5. Server Logs Diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents
Value
Page 10
+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value
© Hortonworks Inc. 2013
20 Business Applications of Hadoop Industry Use Case Type of Data
Financial Services New Account Risk Screens Text, Server Logs
Trading Risk Server Logs
Insurance Underwriting Geographic, Sensor, Text
Telecom Call Detail Records (CDRs) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Real-time Bandwidth Allocation Server Logs, Text, Social
Retail 360° View of the Customer Clickstream, Text
Localized, Personalized Promotions Geographic
Website Optimization Clickstream
Manufacturing Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Crowdsourced Quality Assurance Social
Healthcare Use Genomic Data in Medical Trials Structured
Monitor Patient Vitals in Real-Time Sensor
Pharmaceuticals Recruit and Retain Patients for Drug Trials Social, Clickstream
Improve Prescription Adherence Social, Unstructured, Geographic
Oil & Gas Unify Exploration & Production Data Sensor, Geographic & Unstructured
Monitor Rig Safety in Real-Time Sensor, Unstructured
Government ETL Offload in Response to Federal Budgetary Pressures Structured
Sentiment Analysis for Government Programs Social
Page 11
© Hortonworks Inc. 2013
YARN Unlocks the Data Lake Vision
1st Gen of Hadoop
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Single Use System Batch Apps
Page 12
Store all data in one place, interact in multiple ways
Multi-Use Data Platform Batch, Interactive, Online, Streaming, …
Redundant, Reliable Storage (HDFS)
Efficient Cluster Resource Management & Shared Services
(YARN)
Flexible Data Processing
Hive, Pig, others…
Batch MapReduce
Batch & Interac4ve Tez
Online Data Processing
HBase, Accumulo
Stream Processing
Storm
others
…
2nd Gen of Hadoop
Classic Hadoop Apps
© Hortonworks Inc. 2013
The Common Journey with Hadoop SC
ALE
SCOPE Page 13
New Analytic Apps New Types of Data LOB Driven
More data and analytic apps
MDA/Data Lake Cost, Insight IT Driven
© Hortonworks Inc. 2013
Example Journey Towards a Data Lake D
ATA
VALUE
Risk Management E.g., Fraud Reduction
Operational Excellence E.g., Network Maintenance
New Business E.g., Data as a Product
Customer Intimacy E.g., 360 Degree View
of the Customer
TB’s
P
B
PB
’s
Page 14
DATA LAKE An architectural shift in the
data center that uses Hadoop to deliver deep insight across a
large, broad, diverse set of data at efficient scale
Data Lake
© Hortonworks Inc. 2013
Enabling Hadoop for the Enterprise
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all
1 2 Integration
Interoperable with existing data center investments
Skills Leverage your existing skills: development, analytics, operations 3
Page 15
© Hortonworks Inc. 2013
Deployment Model Provide the efficient deployment op4on for your organiza4on
Presenta4on & Applica4on Enable both exis4ng and new applica4ons to provide
value to the organiza4on
Opera4ons Empower Current opera4ons and security tools to manage Hadoop
Core Capabilities of Enterprise Hadoop
Page 16
Data Governance Integrate with exis4ng systems and move data
in/out and within the environment
Security Provide layered approach to
security through Authen4ca4on, Authoriza4on, Accountability
and Data Protec4on
Opera4ons Allow you to deploy and effec4vely manage the environment
BROAD INSIGHT Data Access
Access your data simultaneously in mul4ple ways (batch, interac4ve, real4me)
EFFICIENT SCALE Data Management
Store and process all of your Corporate Data Assets
1 Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all
© Hortonworks Inc. 2013
Enabling Familiar and Existing Tools
Page 17
DEVE
LOPE
R AN
ALYST
OPE
RATO
R
COLLECT PROCESS BUILD
EXPLORE QUERY DELIVER
PROVISION MANAGE MONITOR
1 2 Skills
Leverage your existing skills: development, analytics, operations
Integration Interoperable with existing data center investments 3
Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all
© Hortonworks Inc. 2013
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Requirements for Enterprise Hadoop
Page 18
1 2 Skills
Leverage your existing skills: development, analytics, operations
Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all
Integrate with Applications Business Intelligence, Developer IDEs, Data Integration
Systems Data Systems & Storage, Systems Management
Platforms Operating Systems, Virtualization, Cloud, Appliances
Integration Interoperable with existing data center investments 3
© Hortonworks Inc. 2013
Elasticsearch in the Modern Data Architecture
Page 19
APPLICAT
IONS
DATA
SYSTEM
SOURC
ES
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
HANA
OPERATIONAL TOOLS
DEV & DATA TOOLS
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
© Hortonworks Inc. 2013
Today’s Topics
• Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A
Page 20
Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
What is Elasticsearch?
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Elasticsearch real time,
search and analytics engine
open-source
Lucene based
distributed
scales massively
high availability
RESTful API JSON
over HTTP
schema free
multi tenancy
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Data From Any Source
Instantly Analyze
Actionable Insights
The Elasticsearch ELK Stack
Logstash Elasticsearch Kibana
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
What about Elasticsearch the Company? • Support 100s of Companies in Production environments • Training Developers and Ops around the world on ELK • Drive the ELK Projects forward, great things to come! • Commercial products: Marvel to monitor and manage ELK
• Backed by the best: Benchmark, Index Ventures
Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Who’s using Elasticsearch?
Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
What are people saying about Elasticsearch?
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Real-time Search • Europe’s largest professional social
network
• Over 14 Million members
• New data available for search immediately vs 50 mins
• “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” - Dr. Daniel Olmedilla, Vice President, Data Science at XING
Copyright ElasBcsearch 2014. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
How do they fit together?
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Elasticsearch
Elasticsearch-Hadoop Library
Raw data
Integrate Natively Choice
Index seamlessly
Free Text Search
Analytics
Clean, Enrich
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Elasticsearch-Hadoop Library • Java Library for integrating Elasticsearch and Hadoop
• Pig, Hive, Cascading, MapReduce
• Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake
• Scales with Hadoop
• Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary)
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Multiple Architectures
-Same Hardware -1 for 1
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
-Separate Hardware -Clusters of each
-Scale Independently
ES Node
ES Node
ES Node
Multiple Architectures
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Show me! • Hortonworks HDP Sandbox - making Hadoop easy!
• Installed Elasticsearch, Marvel and Kibana on Sandbox
• Upload elasticsearch-hadoop jar as Pig Storage lib
• Index CSV data from Pig to Elasticsearch
• Query Elasticsearch from Pig - best of both
• Kibana to Visualize and Discover
Copyright ElasBcsearch 2013. Copying, publishing and/or distribuBng without wriJen permission is strictly prohibited
Where to find us?
elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch
IRC (webchat.freenode)
Github elasticsearch/elasticsearch
© Hortonworks Inc. 2013
Try Hadoop Today… Get Involved
Download the Hortonworks Sandbox
Page 35
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearch
Contact us: [email protected]