the great divide: bridging structured and unstructured data for new customer insights
DESCRIPTION
The Briefing Room with John O’Brien and Teradata Slides from the Live Webcast on Aug. 21, 2012 Data and context -- that's the ultimate combination. Uniting those two is the goal of today's information managers, as they seek to connect the world of traditional business intelligence on structured data to the ocean of new, multi-structured Big Data that can provide so much valuable context and additional insights. The question of how begs answers, but the big issue of what technology is best dominates the dialogue in the world's most cutting-edge companies. Check out this episode of The Briefing Room to learn from veteran database Analyst John O'Brien of Radiant Advisors as he explains how certain information architectures have advantages over others with respect to bridging structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata who will detail his company's innovations in SQL-MapReduce, which allows professionals to perform multi-structured analytics at scale. He'll describe how a new extension called SQL-H allows analysts to use Hadoop as if it were just another table in the database. For more information visit: http://www.insideanalysis.comTRANSCRIPT
Tuesday, August 21, 2012
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
Tuesday, August 21, 2012
August: Analytics
September: Integration
October: Database
November: Cloud
December: Innovators
Twitter Tag: #briefr
Tuesday, August 21, 2012
Twitter Tag: #briefr
Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing.
What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable.
The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge.
Tuesday, August 21, 2012
Twitter Tag: #briefr
John is the Principal and Founder of Radiant Advisors. As a recognized thought leader in BI, John has been publishing articles and presenting at conferences for the past 10 years. He has been a Best Practices judge, presenter and panel participant at TDWI. John has also developed and presented his own courses: Radiant Advisors Learning Catalog.
John has a B.S. in Mechanical Engineering from California State University and an M.B.A. from the University of Colorado. He is a Certified Business Intelligence Professional with mastery levels in Leadership and Administration, Database Administration and Business Intelligence.
Tuesday, August 21, 2012
Twitter Tag: #briefr
Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications.
It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities.
Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.
Tuesday, August 21, 2012
Twitter Tag: #briefr
Steve Wooledge is Senior Director of Marketing at Teradata’s Aster Center of Innovation, where he is an evangelist for the company’s analytic platform product and responsible for awareness, demand generation, and solution marketing for the data scientist. Steve has more than 10 years of experience in product marketing and business development for business intelligence, data management, Web analytics and e-commerce products.
Prior to his current role, Steve held product marketing positions at Interwoven and Business Objects as well as sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum.
Steve has a B.S. in Chemical Engineering and an M.B.A. in Marketing and Finance.
Tuesday, August 21, 2012
The Unified Big Data Architecture &Bridging the Analyst Gap for Hadoop
Steve Wooledge, Sr. Director of MarketingAugust 21, 2012
Tuesday, August 21, 2012
10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
• Quick intro to Teradata Aster
• The need for a unified big data architecture
• Bridging the Analyst Gap for Hadoop: Aster SQL-H™
Topics
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules
§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules
§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules
§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Teradata Aster
§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database
§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL
§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules
§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy
Customers
Leading Innovator in Data Discovery for the Enterprise
Tuesday, August 21, 2012
12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Your Analytic & Advanced Reporting Applications
Store
Process
Rapid Analytics Development
Embedded Analytic Processing
Massively Parallel Data Storage
• Commodity-hardware based• Software only, appliance, or cloud• Relational-data architecture can
be extended for non-relational types
• SQL-MapReduce framework• Analyze both structured
& multi-structured data• Linear, incremental scalability
• 50+ pre-built analytic modules• Visual IDE; develop apps in hours• Many programming languages
Analysts Data ScientistsBusiness UsersCustomers
Develop
Teradata Aster MapReduce Platform
Tuesday, August 21, 2012
13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
• Payment processing analytics down from one day to one minute with SQL-MapReduce
• Web log data processing from seven hours to 20 minutes
• Interactive dashboards with all KPI’s from point of order inception—down from five hours to five minutes
Business Impact / ROI
Increased conversions from recommendations with 360-degree view of customer across in-store and .com behavior
Build revenue attribution models to link every purchase to a site feature
Reduce churn from one day previously to 20 minutes
Deeper Consumer Insights with Teradata Aster
Tuesday, August 21, 2012
14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Big Data: From Transactions to Interactions
Web logs WEB
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search marketing
Behavioral Targeting
Dynamic Funnels
Terabytes
Segmentation
Offer details
Customer Touches
Support Contacts
CRM
Gigabytes
MegabytesPurchase detailPurchase recordPayment record
ERP
Tuesday, August 21, 2012
14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Big Data: From Transactions to Interactions
Increasing data variety and complexity
BIG DATAUser Generated Content
Mobile Web
SMS/MMS
Sentiment External Demographics
HD Video
Speech to Text
Product/Service Logs
Social Network
Business Data Feeds
User Click Stream
Web logs WEB
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search marketing
Behavioral Targeting
Dynamic Funnels
Terabytes
Segmentation
Offer details
Customer Touches
Support Contacts
CRM
Gigabytes
MegabytesPurchase detailPurchase recordPayment record
ERP
Petabytes
Tuesday, August 21, 2012
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data ArchitectureBridging Classic & Big Data Worlds
IT structures the data to answer those questions
Business determines what questions to ask
Classic MethodStructured & Repeatable Analysis
Tuesday, August 21, 2012
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data ArchitectureBridging Classic & Big Data Worlds
“Capture only what’s needed”
IT structures the data to answer those questions
Business determines what questions to ask
Classic MethodStructured & Repeatable Analysis
Tuesday, August 21, 2012
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data ArchitectureBridging Classic & Big Data Worlds
“Capture only what’s needed”
IT delivers a platform for storing, refining, and
analyzing all data sourcesBusiness explores data for questions worth answering
Big Data MethodMulti-structured & Iterative Analysis
IT structures the data to answer those questions
Business determines what questions to ask
Classic MethodStructured & Repeatable Analysis
Tuesday, August 21, 2012
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data ArchitectureBridging Classic & Big Data Worlds
“Capture only what’s needed”
IT delivers a platform for storing, refining, and
analyzing all data sourcesBusiness explores data for questions worth answering
Big Data MethodMulti-structured & Iterative Analysis
IT structures the data to answer those questions
Business determines what questions to ask
Classic MethodStructured & Repeatable Analysis
“Capture in case it’s needed”
Tuesday, August 21, 2012
15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Unified Big Data ArchitectureBridging Classic & Big Data Worlds
“Capture only what’s needed”
SQL performance and structure
MapReduce Processing Flexibility
IT delivers a platform for storing, refining, and
analyzing all data sourcesBusiness explores data for questions worth answering
Big Data MethodMulti-structured & Iterative Analysis
IT structures the data to answer those questions
Business determines what questions to ask
Classic MethodStructured & Repeatable Analysis
“Capture in case it’s needed”
Tuesday, August 21, 2012
16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
MapReduce Analytics
Example: Pattern Matching Analysis
SQL-MapReduce• Single-pass of data• Linked list sequential analysis
Traditional SQL• Self-Joins for sequencing• Limited operators for ordered data
Tuesday, August 21, 2012
17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
The Advantages of MapReduceRaw click-stream data and pattern matching with nPathGoal• Increase understanding of customer behavior
on a website to improve advertising rates or website navigation
Challenges• Full website session-level data needed,
typically from raw web logs• Requires complex multi-pass SQL queries or Non-SQL techniques• Requires rewriting query to change number
of clicks analyzed
MapReduce Value• Performance: Single pass over data
regardless of number of clicks analyzed• Manageability: Much simpler code— from 350 lines of SQL to 18-line SQL- MapReduce• Ease of Use: Pattern flexibility to handle
varied numbers of clicks and click patterns without rewriting code
Click Stream Analysis: Comparative Performance
Example Analytic LogicPeople who search ‘diabetes’ also browse…People who download visit pages A, B, D …
0
100
200
300
400
SQL (3pg) SQL-‐MR (3pg) SQL-‐MR (4pg) SQL-‐MR (8pg) SQL-‐MR (12pg)
Time
MapReduce for 3, 4, 8, 12 pages:77-131 seconds
SQL for 3 pages: 6 minutes
Tuesday, August 21, 2012
18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Need for a Unified Big Data Architecture for New InsightsEnabling All Users for Any Data Type from Data Capture to Analysis
Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.
Discover and Explore Reporting and Execution in the Enterprise
Capture, Store and Refine
Audio/Video Images Docs Text Web &
SocialMachine
Logs CRM SCM ERP
Tuesday, August 21, 2012
19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Capture, Store, Refine
Teradata Unified Big Data ArchitectureAny User, Any Data, Any Analysis
Audio/Video Images Text Web &
SocialMachine
Logs CRM SCM ERP
Engineers Business AnalystsQuantsData Scientists
Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.
Discovery Platform Integrated Data Warehouse
Aster MapReduce Portfolio Teradata Analytics Portfolio
SQL-H
Tuesday, August 21, 2012
20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Hadoop Points of Integration – Bulk Data Transfer• Teradata:Hadoop• JDBC (available today)− Hadoop programs can call JDBC
• TDDBinputformat/Dboutputformat (available today)− Submits SQL to JDBC
• Cloudera Sqoop (available today)− Command line import/export database objects
• Aster:Hadoop• Aster-Hadoop Adaptor – node:node transfer using SQL-MapReduce
Opportunity for analysts to more easily access Hadoop data
Tuesday, August 21, 2012
Source: Enterprise Strategy Group; April 5, 2012
Tuesday, August 21, 2012
Source: Enterprise Strategy Group; April 5, 2012
Tuesday, August 21, 2012
Bridging the Business Analyst Gap for Hadoop Data
Tuesday, August 21, 2012
23 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Aster SQL-H™A Business User’s Bridge to Analyze Hadoop Data
Aster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop
•Allow standard ANSI SQL to Hadoop data
•Leverage existing BI tool investments
•Enable 50+ prebuilt SQL-MapReduce Apps and IDE
•Lower costs by making data analysts self-sufficient
Announced June 12th, 2012
Tuesday, August 21, 2012
24 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
The Big Data Architecture Today Has GapsAnalyst’s Goal: Get Insights from Data in Hadoop
Business AnalystsQuantsData Scientists
SQLSQL & SQL-MapReduce
Teradata Aster Discovery Platform
HDFS
Teradata IDW
Aster MapReduce Portfolio Teradata Analytics Portfolio
Engineers
IT is the optimizer
MR, Pig, Hive
Custom Code and Development
Tuesday, August 21, 2012
25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analytics on Hadoop Data with Aster SQL-H
Business AnalystsQuantsData Scientists
SQLSQL & MapReduce
HDFS
Aster MapReduce Portfolio Teradata Analytics Portfolio
Engineers
Teradata Aster Discovery Platform
Teradata IDW
Tuesday, August 21, 2012
25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analytics on Hadoop Data with Aster SQL-H
Business AnalystsQuantsData Scientists
SQLSQL & MapReduce
HDFS
Aster MapReduce Portfolio Teradata Analytics Portfolio
Engineers
Aster MapReduce Portfolio
SQL SQL & SQL-MapReduceSQL-H
Teradata Aster Discovery Platform
Teradata IDW
Tuesday, August 21, 2012
26 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
HCatalog
Pig
Hadoop MapReduce
Hive
Aster SQL-H
HDFS
Aster SQL-H Integration with Hadoop CatalogA Business User’s Bridge to Analyzing Data in Hadoop
• Industry’s First Database Integration with Hadoop’s HCatalog
• Abstraction layer to easily and efficiently read structured & multi-structured data stored in HDFS
• Uses Hadoop Catalog (HCatalog) to perform data abstraction functions (e.g. automatically understands tables, data partitions)
• HDFS data presented to users as Aster tables
• Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools
Tuesday, August 21, 2012
27 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
HCatalog
Pig
Hadoop MR
Hive
Aster Layer: SQL-H
Hadoop Layer: HDFS
Data & Processing Locality in SQL-H
Dat
a
Data Filtering
•SQL & SQL-MapReduce processing•Intermediate data persistence•Optional: HDFS data subset persistence for maximum performance
•Hcatalog: metadata store
•HDFS: data repository
•No MapReduce processing in Hadoop
•Directly & in parallel move data from HDFS to Teradata Aster
Tuesday, August 21, 2012
28 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Business Analysts (Powerful analytics & Performance)•50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio)•Simplified, SQL-based interface with Hadoop data structures (Hcatalog)•Interoperability with existing ecosystem & skillset
Architects and Administrators (Maintainability)•Leverage existing DBA skill-sets without additional overhead•Simplify administration and monitoring
- Alternatives require manual creation and maintenance of metadata- Less work and fewer errors- Can do filtering with Aster; select data from HCatalog, leverage partitioning
Benefits of Aster SQL-H™Deep metadata layer integration between Aster and Hadoop
Tuesday, August 21, 2012
29 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Some of the 50+ out-of-the-box analytical appsAster MapReduce Portfolio: the App Store of Big Data
Path AnalysisDiscover patterns in rows of sequential data
Text AnalysisDerive patterns and extract features in textual data
Statistical AnalysisHigh-performance processing of common statistical calculations
SegmentationDiscover natural groupings of data points
Marketing AnalyticsAnalyze customer interactions to optimize marketing decisions
Data TransformationTransform data for more advanced analysis
Tuesday, August 21, 2012
Big Data Architecture: Optimizing Workloads with Specialized Approach
Tuesday, August 21, 2012
31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type
Low Cost Storage & Retention
Loading and RefiningLoading and RefiningReporting
Analytics (User-driven, interactive)
Low Cost Storage & Retention Data Pre-Processing,
Prep, Cleansing TransformationsReporting
Analytics (User-driven, interactive)
Stable Schema
Teradata /Hadoop
Teradata Teradata TeradataTeradata(SQL analytics)
Evolving Schema Hadoop Aster /
Hadoop
Aster(joining with structured data)
AsterAster(SQL + MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop
Aster(MapReduce Analytics)
Social feeds, text, document, or image processingAudio/video storage and refining
Storage and batch transformations
Interactive data discoveryWeb clickstream
Set-top box analysisCDRs, Sensor logs, JSON
Financial analysis, ad-Hoc/OLAPEnterprise-wide BI and Reporting
Spatial/TemporalActive Execution
Tuesday, August 21, 2012
31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type
Low Cost Storage & Retention
Loading and RefiningLoading and RefiningReporting
Analytics (User-driven, interactive)
Low Cost Storage & Retention Data Pre-Processing,
Prep, Cleansing TransformationsReporting
Analytics (User-driven, interactive)
Stable Schema
Teradata /Hadoop
Teradata Teradata TeradataTeradata(SQL analytics)
Evolving Schema Hadoop Aster /
Hadoop
Aster(joining with structured data)
AsterAster(SQL + MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop
Aster(MapReduce Analytics)
Social feeds, text, document, or image processingAudio/video storage and refining
Storage and batch transformations
Interactive data discoveryWeb clickstream
Set-top box analysisCDRs, Sensor logs, JSON
Tuesday, August 21, 2012
31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type
Low Cost Storage & Retention
Loading and RefiningLoading and RefiningReporting
Analytics (User-driven, interactive)
Low Cost Storage & Retention Data Pre-Processing,
Prep, Cleansing TransformationsReporting
Analytics (User-driven, interactive)
Stable Schema
Teradata /Hadoop
Teradata Teradata TeradataTeradata(SQL analytics)
Evolving Schema Hadoop Aster /
Hadoop
Aster(joining with structured data)
AsterAster(SQL + MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop
Aster(MapReduce Analytics)
Social feeds, text, document, or image processingAudio/video storage and refining
Storage and batch transformations
Tuesday, August 21, 2012
31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type
Low Cost Storage & Retention
Loading and RefiningLoading and RefiningReporting
Analytics (User-driven, interactive)
Low Cost Storage & Retention Data Pre-Processing,
Prep, Cleansing TransformationsReporting
Analytics (User-driven, interactive)
Stable Schema
Teradata /Hadoop
Teradata Teradata TeradataTeradata(SQL analytics)
Evolving Schema Hadoop Aster /
Hadoop
Aster(joining with structured data)
AsterAster(SQL + MapReduce Analytics)
Format, No Schema Hadoop Hadoop Hadoop
Aster(MapReduce Analytics)
Tuesday, August 21, 2012
32 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
ESG Benchmark Report Summary3rd-party validation of Aster and Hadoop “fit”
Scope• Identical hardware for Aster and Hadoop• Clickstream, sentiment, & traditional retail data• Compare “time to insight” and “time to develop”
Results•Loading: Hadoop 1.8x faster•Transforms: Hadoop 1.3x faster•Analytics: Aster 35x faster (range: 4-416x)•Development: Aster 3x faster
Tuesday, August 21, 2012
Confidential and proprietary. Copyright © 2012 Teradata Corporation.33
Hadoop vs. Aster Web Clickstream Analytics
Tuesday, August 21, 2012
Confidential and proprietary. Copyright © 2012 Teradata Corporation.33
Hadoop vs. Aster Web Clickstream Analytics
Aster33X Faster
Aster1.5X Faster
Aster6X Faster
On average Aster is
18x Faster
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)
• Business Question• How do we find and rank the 10
most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in
the database, for each user
• Analytics Question• What is the most common path for
a user on the site to…1. Enter the site2. View any page (other than the Help
page)- Make a purchase on the Checkout
page- Rank the top 10 occurrences
SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as
click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;
Tuesday, August 21, 2012
35 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Single Channel Pathing Analysis
Tuesday, August 21, 2012
36 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Analyzing Multi-channel Identifies Advertising Signal
Tuesday, August 21, 2012
Confidential and proprietary. Copyright © 2012 Teradata Corporation.37
Hadoop Provides 1.3x Faster ELT on Average
Tuesday, August 21, 2012
Confidential and proprietary. Copyright © 2012 Teradata Corporation.38
When to Use Which Depends on Data Type- Aster faster on parsing and sessionizing Weblogs
Tuesday, August 21, 2012
39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Evolving Schema ExampleAster Digital Marketing Client
Raw Web Logs
Analytic Tools
Teradata AsterC
ooki
e-le
vel
data
Archival
Hadoop (on AWS)(Storage, aggregations,
cleansing)
Ad Server Logs
Media Data (Aggregated)
Custom Data by Client
Tuesday, August 21, 2012
39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Evolving Schema ExampleAster Digital Marketing Client
Raw Web Logs
Analytic Tools
Teradata AsterC
ooki
e-le
vel
data
Archival
Hadoop (on AWS)(Storage, aggregations,
cleansing)
Ad Server Logs
Media Data (Aggregated)
Custom Data by Client
• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers
• Sessionize by client• nPath identifies segment path
analysis (behavior after ads)
Tuesday, August 21, 2012
39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Evolving Schema ExampleAster Digital Marketing Client
• Benefits:Raw Web
Logs
Analytic Tools
Teradata AsterC
ooki
e-le
vel
data
Archival
Hadoop (on AWS)(Storage, aggregations,
cleansing)
Ad Server Logs
Media Data (Aggregated)
Custom Data by Client
• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers
• Sessionize by client• nPath identifies segment path
analysis (behavior after ads)
Tuesday, August 21, 2012
39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Evolving Schema ExampleAster Digital Marketing Client
• Benefits:- Marketing analysts more
productive with AsterRaw Web
Logs
Analytic Tools
Teradata AsterC
ooki
e-le
vel
data
Archival
Hadoop (on AWS)(Storage, aggregations,
cleansing)
Ad Server Logs
Media Data (Aggregated)
Custom Data by Client
• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers
• Sessionize by client• nPath identifies segment path
analysis (behavior after ads)
Tuesday, August 21, 2012
39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
Evolving Schema ExampleAster Digital Marketing Client
• Benefits:- Marketing analysts more
productive with Aster- Lower cost - storage and
batch refining done on Amazon Elastic MapReduce
Raw Web Logs
Analytic Tools
Teradata AsterC
ooki
e-le
vel
data
Archival
Hadoop (on AWS)(Storage, aggregations,
cleansing)
Ad Server Logs
Media Data (Aggregated)
Custom Data by Client
• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers
• Sessionize by client• nPath identifies segment path
analysis (behavior after ads)
Tuesday, August 21, 2012
40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
More Accurate Customer Churn Prevention
Data Sources
Multi-Structured Raw Data
Call Center Voice Records
Check Images
Traditional Data Flow
Analysis +
Marketing Automation
(Customer Retention Campaign)
Capture, Retain & Refine Layer
ETL Tools
Hadoop
Call Data
Check Data
Social feeds
Teradata Integrated DW
Dim
ensi
onal
Dat
a
An
alytic Resu
lts
Aster Discovery Platform
Clickstream Data
Sentiment Scores
Tuesday, August 21, 2012
40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
More Accurate Customer Churn Prevention
Hadoop captures, stores and
transforms social, images and call
records
Aster does path and sentiment analysis with
multi-structured data
Data Sources
Multi-Structured Raw Data
Call Center Voice Records
Check Images
Traditional Data Flow
Analysis +
Marketing Automation
(Customer Retention Campaign)
Capture, Retain & Refine Layer
ETL Tools
Hadoop
Call Data
Check Data
Social feeds
Teradata Integrated DW
Dim
ensi
onal
Dat
a
An
alytic Resu
lts
Aster Discovery Platform
Clickstream Data
Sentiment Scores
Tuesday, August 21, 2012
41 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
SummaryBringing the VALUE of Hadoop to the Enterprise
• Teradata is focused on extracting most business value for customers from data in Hadoop
• Mainstream organizations need a unified big data architecture- Best-of-breed with Hadoop, Aster, Teradata- Brings “Data Science” to business analysts- 50+ business-ready MapReduce analytics and apps- Enabled by SQL-MapReduce framework and new SQL-H
• Learn more at www.asterdata.com/mapreduce
Tuesday, August 21, 2012
Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
THE GREAT DIVIDE: BRIDGING UNSTRUCTURED AND STRUCTURED DATA FOR NEW CUSTOMER INSIGHTS
§Briefing Room - August 21, 2012§John O’Brien, Radiant Advisors§[email protected]
1
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
Principal and Founder, Radiant AdvisorsJOHN O’BRIEN
§With over 25 years of experience delivering value through data warehousing and BI programs, John O’Brien's unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program.
§Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies.
2
Instructor 10+ yearsAs a recognized thought leader in BI, John has been publishing articles and presenting at conferences in North America and Europe for the past 10 years, including The Data Warehousing Institute where he has been invited as one of TDWI’s Best Practices judges, Executive Summit presenters and expert panel participants. John has also developed and presented many of his own courses that now comprise the initial Radiant Advisors Learning Catalog.
EducationJohn has a B.S. in Mechanical Engineering from California State University with an emphasis in control systems and instrumentation and an Executive M.B.A. from University of Colorado. He is a Certified Business Intelligence Professional (CBIP) since 2005 with mastery levels in Leadership and Administration, Database Administration and Business Intelligence.
ExperiencedIn 2005, John co-founded and became CTO of a data warehouse appliance company that raised $43 million in several rounds of venture capital financing and has many global production customers. As CTO, John’s primary role was to focus product development and BI market strategy.
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
MapReduce
WHERE DOES CONTEXT LIVE?
3§Bridging the Great Divide: Unstructured and Structured Data
Stru
ctur
edUn
stru
ctur
ed
Context in structures
Context leveraged
Context in structures
Context(s) leveraged
Context in abstractionBI ToolsDirect access
Hadoop HDFS
Hiv
e
PIG
MapReduce
Individual Context with Data Scientists
Centralized Context inabstraction
Context in Data Scientists
Centralized Context inabstraction
More Rigid More Agile
HCatalog
Hiv
e
PIG
M/R
Hadoop HDFS
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
MapReduce
UNLOCKING UNSTRUCTURED VALUE
4§Bridging the Great Divide: Unstructured and Structured Data
çHCatalog
BI
Tool
Very Few Data Scientists
Many Many Consumers
Yesterday Tomorrow
DB
More Analysts
Very Few Data Scientists
More Analysts
Valu
e
Valu
e
Users Involved Users Involved
Hadoop HDFS
Hiv
e
PIG
MapReduce
Power Users Power Users
Analysts &Casual Users
Hadoop HDFS
Hiv
e
PIG
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
DISCOVERY IN BI PROCESSES
5§Bridging the Great Divide: Unstructured and Structured Data
BI
Tool
Hadoop HDFS
Hiv
e
PIG
M/R
çHCatalog
Very Few Data Scientists
Many Many Consumers
More Analysts/Modelers
Many More Analysts
ç
FewAnalysts/Modelersç
ç
BI
Tool
DiscoverContext
1.
Defined Context Available to
Structured Database2.
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
MODERN BI ARCHITECTURES
6§Bridging the Great Divide: Unstructured and Structured Data
Hadoop HDFS
Map
Redu
ceInternet,
Sensor data
çVery Few Data Scientists
Hadoop:Massive ScalabilityLowest CostHandles Complexity
çFewAnalysts/Modelers
Operational SystemsInsulate Change or Direct to Staging
Staging
Data Marts
Migrate History
HCatalog
or ETL Acquire
Data MartsData Marts
ETL
or ETL
ç
ETL
çç
PIG
Hive
Many Many Consumers
Data Warehouse:Optimized Work LoadsOperationalBenefit from Context
Tuesday, August 21, 2012
© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000
SUMMARY
7§Bridging the Great Divide: Unstructured and Structured Data
• Understand context in processes and architectures
• Realize that value is unlocked with more users
• Discovery is a powerful BI process to operationalize
• Modern BI Architectures are integrating Hadoop
Tuesday, August 21, 2012
Twitter Tag: #briefr
• Is Aster Solution intended for Data Discovery Platform and/or Analytic Engine Platform?
• Is there any difference in semantics for Teradata's vision of Integrated Data Warehouse vs. "Analytic Platform" which includes Aster and Hadoop?
• Does the Hcatalog need to be defined before users can use SQL-H to query Hadoop?
• The Aster MapReduce Portfolio enables its users to query and pull data from the Hadoop HDFS directly via SQL-H. When data is pulled in from HDFS into Aster, are the Aster tables modeled as in Hcatalog or as key-value pairs?
• Is the output of the SQL-MR in Aster inserted into another physical table for further usage?
Tuesday, August 21, 2012
Twitter Tag: #briefr
• Given that Hive and PIG are interface layers above the MapReduce processing layer, does the Aster Layer SQL-H work as an interface layer interfacing with MapReduce? Does SQL-H work similar to Hive when processing data inside HDFS?
• When it comes to performance comparisons between Aster and Hadoop, what guidelines were given in sizing the Hadoop environment?
• Given the commodity nature of Hadoop, does it make sense to increase the size of Hadoop environment to gain performance more cost effectively?
• When to use Hadoop or Aster? Based on data type? Based on workload (e.g. Load, ETL, Analyze)? Or Based on Analysis type (e.g. Sentiment Classification or Sessionization)?
Tuesday, August 21, 2012
Twitter Tag: #briefr
• Does Aster store "multi-structured" data such as audio, video, image, pdf, etc files as a blog/clob field in database records or stores pointers to files?
• Does Aster Data have Predictive Modeling Markup Language (PMML) compatibility to enable Discovery through the inter-operability of Analytic Models to allow models developed in SAS or other platforms to be migrated to Aster?
Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012
Twitter Tag: #briefr
August: Analytics
September: Integration
October: Database
November: Cloud
December: Innovators
Tuesday, August 21, 2012
Twitter Tag: #briefr
Tuesday, August 21, 2012