talend open studio and hortonworks data platform

25
© Hortonworks Inc. 2012 Big Data Integration Talend Open Studio & Hortonworks Data Platform Ciaran Dynes: Senior Director, Product Marketing - Talend Jim Walker: Director, Product Marketing - Hortonworks August 8, 2012 Page 1

Upload: hortonworks

Post on 12-May-2015

2.764 views

Category:

Education


2 download

DESCRIPTION

Data Integration is a key step in a Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. OK, I have a cluster…now what? Complex scripts? For wide scale adoption of Apache Hadoop, an intuitive set of tools that abstract away the complexity of integration is necessary.

TRANSCRIPT

Page 1: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Big Data Integration Talend Open Studio & Hortonworks Data Platform

Ciaran Dynes: Senior Director, Product Marketing - Talend Jim Walker: Director, Product Marketing - Hortonworks August 8, 2012

Page 1

Page 2: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Your Presenters

Ciaran Dynes Senior Director, Product Marketing

Page 2

Jim Walker Director, Product Marketing

Page 3: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 3

Talend – The Market Leading Unified Integration Platform

¾  Open source license ¾  Free of charge ¾  Optional support

¾  Commercial license ¾  Subscription model

Data Quality

Data Integration MDM ESB

Talend Open Studio for

Monitoring Execution Deployment Repository Studio

Data Quality

Data Integration

MDM ESB BPM

Talend Enterprise

Recognized as the open source leader in each of its market category by all industry analysts

Page 4: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Hortonworks Snapshot

The industry leading and only 100% open source Apache Hadoop distribution Most experienced open source leadership team –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss) –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) –  Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)

Business model focused on customer success: Hadoop support, services & training – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available

for developers & administrators

•  Headquarters Sunnyvale, CA

•  90+ Employees

•  Formed with core Apache Hadoop engineering team from Yahoo!

•  35 engineers and architects including 25+ Hadoop committers

Page 5: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Next-gen data architecture drivers

Business Drivers

Technical Drivers

Financial Drivers

•  Enable new business models & drive faster growth (20%+)

•  Find insights for competitive advantage & optimal returns

•  Cost of data systems, as % of IT spend, continues to grow

•  Cost advantages of commodity hardware & open source

•  Data continues to grow exponentially

•  Data is increasingly everywhere and in many formats

•  Legacy solutions unfit for new requirements growth

Page 6: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Big data changes the game

Megabytes

Gigabytes

Terabytes

Petabytes

Purchase detail Purchase record Payment record

ERP

CRM

WEB

BIG DATA

Offer details

Support Contacts

Customer Touches

Segmentation

Web logs

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMS Sentiment

External Demographics

HD Video, Audio, Images

Speech to Text

Product/Service Logs

Social Interactions & Feeds

Business Data Feeds

User Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

Increasing Data Variety and Complexity

Transactions + Interactions + Observations = BIG DATA

Page 7: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

opt imize

Use cases: optimize outcomes at scale

Media Content

Intelligence Detection

Investment Algorithms

Advertising Performance

Fraud Prevention

Regulation Compliance

Retail / Wholesale Inventory turns

Manufacturing Supply chains

Healthcare Patient outcomes

Education Learning outcomes

Government Citizen services

Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.

Page 8: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

1

•  Simplify deployment to get started quickly and easily

•  Monitor, manage any size cluster with familiar console and tools

•  Only platform to include data integration services to interact with any data source

•  Metadata services opens the platform for integration with existing applications

•  Dependable high availability architecture

Hortonworks Data Platform

Hortonworks Data Platform

Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease

management, simplify use and ease integration into the enterprise

The only 100% open source data platform for Apache Hadoop

Page 9: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Data Integration Services

•  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig

•  Oozie scheduling allows you to manage and stage jobs

•  Connectors for any database, business application or system

•  Integrated HCatalog storage

Page 9

Bridge the gap between legacy data & Hadoop

Simplify and speed development

Page 10: Talend Open Studio and Hortonworks Data Platform

What is Big Data integration?

Page 11: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 11

Trying to get from this…

Page 12: Talend Open Studio and Hortonworks Data Platform

to this…

ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.

Why Talend…

Page 13: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 13

Forces us to think differently

Key Takeaway #2

Page 14: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 14

…everything that is old, is new again!

But for Talend…. Big data is…

Page 15: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 15

Data driven business

data

decisions

supports

Your business

drives Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or higher costs.

Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).

information

enables governance

Page 16: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 16

BIG data driven business

BIG data

BIG information

BIG business

supports

drives

enables

Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).

governance

BIG decisions

Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or higher costs.

Page 17: Talend Open Studio and Hortonworks Data Platform

Let us show you…

© Talend 2012

Page 18: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 18

Putting Web Logs to use

Scenario: ¾  ACME Web Inc. have thousands of customers and millions of daily page hits on their

ecommerce website ¾  ACME believe they could sell more things, if they could simply figure our buying trends ¾  ACME turns to Big Data to help get a handle on the volume of data they need to manage

© Talend 2012 18

Page 19: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 19

In big data… poor data quality can be magnified at huge scale

Poor Data Quality + Big Data = Big Problems Poor Data Quality * Big Data = Big Problems^2

Key Takeaway #3

Page 20: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

HCatalog

Table access Aligned metadata REST API

•  Raw Hadoop data •  Inconsistent, unknown •  Tool specific access

Apache HCatalog provides flexible metadata services across tools and external access

Metadata Services

•  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive)

•  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API

Shared table and schema management opens the platform

Page 21: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 21

…an open source ecosystem

Democratize Big Data

Talend Open Studio for Big Data •  Improves efficiency of big data job

design with graphic interface

•  Generates Hadoop code •  Run transforms inside Hadoop

•  Native support for HDFS, Pig, Hbase, Sqoop and Hive

•  Apache License •  Available at talend.com

•  Distribution with hadoop vendors coming

Talend Open Studio for Big Data

Pig

Page 22: Talend Open Studio and Hortonworks Data Platform

© Talend 2011 22

…an open source ecosystem

Make Faster and More Informed Decisions

Talend Platform for Big Data •  Builds on Talend Open Studio for Big Data

•  Adds data quality, advanced scalability and management functions

•  MapReduce massively parallel data processing

•  Shared Repository and remote deployment

•  Data quality and profiling

•  Data cleansing

•  Reporting and dashboards

•  Commercial support, warranty/IP indemnity under a subscription license

Talend Platform for Big Data

Pig

Page 23: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Why HDP?

Only Hortonworks Data Platform provides…

•  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects

•  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data

•  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees

•  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources

•  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters

Page 24: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

What next?

•  Expert role based training •  Course for admins, developers

and operators •  Certification program •  Custom onsite options

Page 24

Download Hortonworks Data Platform & Talend Open Studio hortonworks.com/download or talend.com/downlod

1

2 Use the getting started guide hortonworks.com/get-started

3 Learn more… get support

•  Full lifecycle technical support

across four service levels •  Delivered by Apache Hadoop

Experts/Committers •  Forward-compatible

Hortonworks Support

hortonworks.com/training hortonworks.com/support

Page 25: Talend Open Studio and Hortonworks Data Platform

© Hortonworks Inc. 2012

Questions & Answers

TRY download at hortonworks.com download at talend.com

LEARN Hortonworks University

FOLLOW twitter: @hortonworks Facebook: facebook.com/hortonworks

MORE EVENTS hortonworks.com/events

Page 25

Further questions & comments: [email protected]