aws webcast - informatica - big data solutions showcase
DESCRIPTION
Informatica High Performance Big Data Loading for AWSTRANSCRIPT
Big Data Solution Showcase
Informatica High Performance Big Data Loading for AWS
Watch this webinar on demand on: https://connect.awswebcasts.com/p4pshu7r7fi/
Presenters
Ronen Schwartz
Vice President and General Manager
Informatica Cloud
Chris Keyser
Partner Solution Architect
Amazon Web Services
Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
Why Are Customers Adopting AWS?
Agility, Speed to market
& Flexibility
4.
Don’t have to guess on
capacity
3.
Global in minutes
5.
Cost savings through
economics of scale
2. 1.
Trade capital expense for
variable expense
Security and Compliance
6.
Technologies and techniques for working
productively with data, at any scale.
Big Data
Big Data
Potentially Massive Data Sets
Iterative, experimental style of data
manipulation and analysis
Frequently not a steady-state workload;
peaks and valleys
Time to results is key
Hard to configure/manage
AWS Cloud
Massive, virtually unlimited capacity
Iterative, experimental style of infrastructure deployment/usage
Efficient with highly variable workloads
Parallel compute clusters from single data source
Managed services for data storage and analysis
AWS Data Services
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day
EC2EBS
Instance Storage
RedshiftRDS
SQL Stores
EMR
Hadoop
DynamoDB
NoSQL
Kinesis
StreamStorage Services
S3Cloud
FrontGlacier Elasticache
Caching
Data
Pipeline
Orchestrate
Storage Services – Object Store
Amazon S3
99.999999999% durability
Stores anything
Lifecycle and Versioning
Fine Grained Access Control
Reduced Redundancy Storage
WRITES
Continuously replicated to 3 AZ’s
Persisted to disk (SSD)
READS
Strongly or eventually consistent
Amazon DynamoDB - NoSQL Durable Low Latency At Scale
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift
Nokia and AWS: 50% Cost Savings with 2x Faster Queries
Hadoop Tools Improving Rapidly
On-demand, Flexible, Big Data Technologies
Cheaper and Faster
Redshift & Hadoop Price-performance
Advantage over RDBMS
>50% platform cost savings>2x faster
queries Minimal DBA support
Redshift, Hadoop, S3, EMR, Data Pipeline for ETL
Cost-effective for 10s of TB data sets
AMI-based Services
Internet Speed Report Authoring
Hypothesis testing vs. waterfall
http://aws.amazon.com/marketplace
Big Data Case Studies
Learn from other AWS customers
aws.amazon.com/solutions/case-studies/big-data
AWS Marketplace
AWS Online Software Store
aws.amazon.com/marketplace
Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
15
So, How Do You Try Amazon Redshift – Quickly & Easily?
Amazon Redshift
16
Amazon Redshift
ERP, CRM Apps
Files
Legacy, RDBMS
Firewall
Logs, JSONs, Social
SaaS Apps
Use New Cloud & Traditional Data Sources
17
How To Manage Integration In This New World?
Amazon Redshift
ERP, CRM Apps
Files
Legacy, RDBMS
Firewall
Experiment.
Prototype.
Repeat.
Logs, JSONs, Social
SaaS Apps
Amazon RDS Staging, Amazon Redshift DW, Infa Cloud
ERP, CRM Apps
Files
Legacy, RDBMS
Logs, JSONs, Social
SaaS Apps
Experiment.
Prototype.
Repeat.
Amazon
Redshift
Amazon
RDS
Amazon EMR (Hadoop) and Amazon DynamoDB (NoSQL)
ERP, CRM Apps
Files
Legacy, RDBMS
Amazon
RDS
Amazon
Redshift
Amazon
EMR
Logs, JSONs, Social
SaaS Apps
Dynamo
DB
Growth Path to Hybrid Data Warehouse
ERP, CRM Apps
Files
Legacy, RDBMS
Amazon
RDS
Amazon
Redshift
Amazon
EMR
Logs, JSONs, Social
SaaS Apps
Dynamo
DB
Traditional
Staging
DB
Traditional
Data
Warehouse
Informatica Cloud - Get it right. Go live. Grow flexibly.
Cloud
Data Integration
Cloud
Application
Integration
Cloud Test
Data
Management
Cloud
Data
Quality
Cloud Master
Data
Management
Secure
Development DataLeverage Existing
Bulk Data
Cleanse and
De-dupe Data
Consolidate and
Visualize Data
Real Time Access
to Actionable Data
“The Informatica Cloud Platform is the only complete solution for cloud integration and data management
that allows SaaS application administrators, architects, and developers to easily power optimal processes
connected with enterprise-ready data across cloud, on-premises, big data, social, and mobile environments.”
Hundreds of Connectors
JDBC
Technical Innovations for AWS Data Loading
• Broadest out-of-the-box integration for AWS: S3, DynamoDB, Kinesis,
Redshift and RDS available
• Agile data loading for cloud data warehousing with Redshift
• Create target using cloud designer and multiple source objects
• High performance parallel data loading architecture
• E.g. load data in parallel across all 32 nodes in a Redshift cluster
• Push down optimization for increased throughput
• Push data transformations down to optimal source/target database engine
23
Loading data into REDSHIFT,
DYNAMODB and RDS
2
Informatica Cloud Architecture Overview- Redshift
4Secure
Agent
Your Company or VPC 1
Amazon S3Amazon
RDS
Amazon
DynamoDBAmazon
Redshift
3
Informatica Cloud Amazon Redshift Architecture
Firewall
Informatica Cloud Secure Agent
Metadata Mappings
Build mapping and execute job
1
1Retrieve Account Data2
2
3 Put Account Data into Flat File
4 Transfer compressed Flat File to S3
5 Initiate copy from S3
6 Load data into Amazon Redshift
6
3
54
Amazon S3 Amazon Redshift
AMAZON REDSHIFT DEMO!
Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
UBM Tech – Customer Case Study
UBM Tech EventsBringing Together the World’s Technology Communities
Our complete understanding of how technology is
built, sold and used creates unique market value for you.
Technology
Segments
Security Game & App Development
Technology
Markets
Enterprise
Infrastructure & Cloud
Mobile Broadband &
Wireless InfrastructureCommunications
& Collaboration
Vertical
Markets and
Professionals
Government IT Tech MarketersIT Service & Support
ElectronicsElectronic & ARM-based
computer design
Signal Integrity &
High-speed Design
Designers of Things Embedded System Design
IT Executives
UBM Tech MediaBringing Together the World’s Technology Communities
Strong Editorial Brands | 135+ Awards In 3 Years | 3 Launches In 4 Months
Technology
Segments
Security Infrastructure Game DevelopmentDevelopment
Technology
Markets
Enterprise IT TelecommunicationsUnified Communications
Vertical
Markets and
Professionals
Government & Healthcare IT IT Service & Support
ThinkHDI.com
Financial Services
ElectronicsElectronics Engineering Global Supply
& Design Chain
EE Training & Education
Analog Design
System Design
Electronic Parts Search
UBM Tech CredentialsDecades of Insight and Experience. Proven Results.
Data Warehouse and Analytics
• Customer Insights team will utilize new
technologies and tools to build an even
better understanding of the needs of our
communities
• Allows us to foster deeper relationships with
customers by providing them with the right
products and services at the right time
• Ability to provide more holistic
view of prospects and customers for our
clients
C u s t o m e r I n s i g h t i s C r i t i c a l t o W h a t W e D o
• Consolidated data presented at the client level
• Journey mapped on a buying cycle funnel
• Ability to drill down and view information at a topic,
product, campaign & customer level
Deep Analytics
Jerry Chow to UBM Tech?
Who is Jerry Chow?
Goal is Behavioral Targeting
Online and Live Event Content
Engagement
Topical
Metadata
Topical
Metadata
Topical
Metadata
Topical
Metadata
Events
Topical
Metadata
Lead
Nurturing
Topical
Metadata
Webinars
Topical
Metadata
• Metadata generated from content created
and engaged with from live events and
online products
• As users engage with these products,
metadata gets attached to them with a
weightage assigned based on level of
engagement, recency, etc.
• Increased engagement leads to collection
of more behavior which then can be
matched with the live event metadata that
needs to be promoted
• As the promotion is underway, the
matched users get highly
personalized emails inviting them to
attend certain events, online
programs, etc.
How to Consolidate and Analyze Customer Data?
39
Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
How to Consolidate and Analyze Customer Data?
41
Sources
Reporting Component
Tableau
Advanced Analytics Component (R)
Generating a holistic view of customers by:• Merging Customer data from
Online, Registration and Onsite• Ensuring data quality and
consistency
Create an integrated View of customer
ff
Provide visibility to customer information
Phase 1 - Solution Summary
EV2
OpenCalais
NextGen
Eloqua
OpenCalais• Content classes and taxonomy
NextGen• Customer• Demographics• Event• Site Registration
Data made available in Data Warehouse
Providing coherent information to aid decision making by:• Storing Customer information
across time and across event• Ensuring availability of relevant
information
Data Warehouse Redshift
Data Integration: Informatica Cloud
Develop complete understanding of customers
• Report on customer sources• Report on customer, content
consumption & registrations• Report on customer
behaviour
Generate insights to improve customer engagement
• Analyse online behaviour and create customer personas based on customer content affinity
• Identify potential attendees based on their personas
Eloqua• Customer• Permissions• Behaviour
EV2• Customer• Demographics• Event• Registration
BI & Reporting
Reports
Dashboard
Interactive
Visualisation
Predictive
Modeling
Simulation
Data Sources Data Integration Layer Data Warehouse Layer BI & Reporting and Advance Analytics
Next G
en
EV
2
Event
Registration
Elo
qua
Marketing and
Automation
Open
Cala
is
Content
Tagging
Governance and Security
Sourc
e D
ependent
Data
Inte
gra
tion —
ET
L Business rules
consolidation
Data staging
Source
System
Data Sets
Common
Business Rules
Basic Data
Cleansing
Sourc
e Independent
Data
Inte
gra
tion —
ET
L
BI &
Report
ing a
nd
Advance A
naly
tics
Advance
Analytics
SSO & Site
Registration
ProductCustomer
Behavior
Registration
Content
Taxonomy
Campaign
History
Demographic
Sources of data
required for phase 1
Extracting data from various sources into
staging area, transforming data to a target state
and loading into the Data Warehouse
Creating a central repository of data both current and
historical, that allows decision makers/ users to have all
potential information required for decision making
Transforming raw data into meaningful and actionable
insights ,discovery of patterns & hidden opportunities for decision
making and closing the loop with source systems as required
Data Warehouse ArchitectureThe slide outlines the elements and services of the warehouse, with details showing how the components will fit together, providing an organizing
framework to support implementation
Amazon
Redshift
Informatica
CloudAmazon Redshift Tableau Revolution R
Data Warehouse
Content Consumption Lifecycle
Content Consumption Habits
45
46
Unstructured Data Meets Structured Data
Results: 27% Uplift for Event Marketing Open Rate
Results: Predictive Analytics Drive Marketing Offers
48
Next Steps
• Try our 60-Day free trial for Redshift
• www.informaticacloud.com/cloud-trial-for-redshift
50