talend open studio and hortonworks data platform
DESCRIPTION
Data Integration is a key step in a Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. OK, I have a cluster…now what? Complex scripts? For wide scale adoption of Apache Hadoop, an intuitive set of tools that abstract away the complexity of integration is necessary.TRANSCRIPT
© Hortonworks Inc. 2012
Big Data Integration Talend Open Studio & Hortonworks Data Platform
Ciaran Dynes: Senior Director, Product Marketing - Talend Jim Walker: Director, Product Marketing - Hortonworks August 8, 2012
Page 1
© Hortonworks Inc. 2012
Your Presenters
Ciaran Dynes Senior Director, Product Marketing
Page 2
Jim Walker Director, Product Marketing
© Talend 2011 3
Talend – The Market Leading Unified Integration Platform
¾ Open source license ¾ Free of charge ¾ Optional support
¾ Commercial license ¾ Subscription model
Data Quality
Data Integration MDM ESB
Talend Open Studio for
Monitoring Execution Deployment Repository Studio
Data Quality
Data Integration
MDM ESB BPM
Talend Enterprise
Recognized as the open source leader in each of its market category by all industry analysts
© Hortonworks Inc. 2012
Hortonworks Snapshot
The industry leading and only 100% open source Apache Hadoop distribution Most experienced open source leadership team – Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) – Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss) – John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) – Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) – Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)
Business model focused on customer success: Hadoop support, services & training – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available
for developers & administrators
• Headquarters Sunnyvale, CA
• 90+ Employees
• Formed with core Apache Hadoop engineering team from Yahoo!
• 35 engineers and architects including 25+ Hadoop committers
© Hortonworks Inc. 2012
Next-gen data architecture drivers
Business Drivers
Technical Drivers
Financial Drivers
• Enable new business models & drive faster growth (20%+)
• Find insights for competitive advantage & optimal returns
• Cost of data systems, as % of IT spend, continues to grow
• Cost advantages of commodity hardware & open source
• Data continues to grow exponentially
• Data is increasingly everywhere and in many formats
• Legacy solutions unfit for new requirements growth
© Hortonworks Inc. 2012
Big data changes the game
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail Purchase record Payment record
ERP
CRM
WEB
BIG DATA
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMS Sentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
Transactions + Interactions + Observations = BIG DATA
© Hortonworks Inc. 2012
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
opt imize
Use cases: optimize outcomes at scale
Media Content
Intelligence Detection
Investment Algorithms
Advertising Performance
Fraud Prevention
Regulation Compliance
Retail / Wholesale Inventory turns
Manufacturing Supply chains
Healthcare Patient outcomes
Education Learning outcomes
Government Citizen services
Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation.
© Hortonworks Inc. 2012
1
• Simplify deployment to get started quickly and easily
• Monitor, manage any size cluster with familiar console and tools
• Only platform to include data integration services to interact with any data source
• Metadata services opens the platform for integration with existing applications
• Dependable high availability architecture
Hortonworks Data Platform
Hortonworks Data Platform
Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease
management, simplify use and ease integration into the enterprise
The only 100% open source data platform for Apache Hadoop
© Hortonworks Inc. 2012
Data Integration Services
• Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig
• Oozie scheduling allows you to manage and stage jobs
• Connectors for any database, business application or system
• Integrated HCatalog storage
Page 9
Bridge the gap between legacy data & Hadoop
Simplify and speed development
What is Big Data integration?
© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 11
Trying to get from this…
to this…
ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
Why Talend…
© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 13
Forces us to think differently
Key Takeaway #2
© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 14
…everything that is old, is new again!
But for Talend…. Big data is…
© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 15
Data driven business
data
decisions
supports
Your business
drives Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or higher costs.
Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).
information
enables governance
© Talend 2011 © Talend 2011 – Stri2y Private & Confidential 16
BIG data driven business
BIG data
BIG information
BIG business
supports
drives
enables
Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE).
governance
BIG decisions
Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or higher costs.
Let us show you…
© Talend 2012
© Talend 2011 18
Putting Web Logs to use
Scenario: ¾ ACME Web Inc. have thousands of customers and millions of daily page hits on their
ecommerce website ¾ ACME believe they could sell more things, if they could simply figure our buying trends ¾ ACME turns to Big Data to help get a handle on the volume of data they need to manage
© Talend 2012 18
© Talend 2011 19
In big data… poor data quality can be magnified at huge scale
Poor Data Quality + Big Data = Big Problems Poor Data Quality * Big Data = Big Problems^2
Key Takeaway #3
© Hortonworks Inc. 2012
HCatalog
Table access Aligned metadata REST API
• Raw Hadoop data • Inconsistent, unknown • Tool specific access
Apache HCatalog provides flexible metadata services across tools and external access
Metadata Services
• Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive)
• Accessibility: share data as tables in and out of HDFS • Availability: enables flexible, thin-client access via REST API
Shared table and schema management opens the platform
© Talend 2011 21
…an open source ecosystem
Democratize Big Data
Talend Open Studio for Big Data • Improves efficiency of big data job
design with graphic interface
• Generates Hadoop code • Run transforms inside Hadoop
• Native support for HDFS, Pig, Hbase, Sqoop and Hive
• Apache License • Available at talend.com
• Distribution with hadoop vendors coming
Talend Open Studio for Big Data
Pig
© Talend 2011 22
…an open source ecosystem
Make Faster and More Informed Decisions
Talend Platform for Big Data • Builds on Talend Open Studio for Big Data
• Adds data quality, advanced scalability and management functions
• MapReduce massively parallel data processing
• Shared Repository and remote deployment
• Data quality and profiling
• Data cleansing
• Reporting and dashboards
• Commercial support, warranty/IP indemnity under a subscription license
Talend Platform for Big Data
Pig
© Hortonworks Inc. 2012
Why HDP?
Only Hortonworks Data Platform provides…
• Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects
• Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data
• Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees
• Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources
• Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters
© Hortonworks Inc. 2012
What next?
• Expert role based training • Course for admins, developers
and operators • Certification program • Custom onsite options
Page 24
Download Hortonworks Data Platform & Talend Open Studio hortonworks.com/download or talend.com/downlod
1
2 Use the getting started guide hortonworks.com/get-started
3 Learn more… get support
• Full lifecycle technical support
across four service levels • Delivered by Apache Hadoop
Experts/Committers • Forward-compatible
Hortonworks Support
hortonworks.com/training hortonworks.com/support
© Hortonworks Inc. 2012
Questions & Answers
TRY download at hortonworks.com download at talend.com
LEARN Hortonworks University
FOLLOW twitter: @hortonworks Facebook: facebook.com/hortonworks
MORE EVENTS hortonworks.com/events
Page 25
Further questions & comments: [email protected]