big data and your data warehouse -...
Post on 29-Jun-2020
3 Views
Preview:
TRANSCRIPT
Philip Russom
TDWI Research Director for Data Management
April 5, 2012
Big Data and Your Data Warehouse
Sponsor
Speakers
Philip Russom Research Director,
Data Management,
TDWI
Peter Jeffcock Director,
Oracle Product Marketing
Today’s Agenda • Big Data
– Was a problem; now it’s an opportunity
– The opportunities of Big Data Analytics
• Definitions and Comparisons – Old Big Data versus New Big Data
– Big Data Analytics, Data Warehouses, Analytic Databases
• Big Data and Analytics affect Your Data Warehouse – Scalability, Workloads, Data Types, Architectures
– Analytic tool choices, Data integration/quality best practices
• Use Cases for Your Data Warehouse & Big Data Analytics
• Recommendations
Big Data: Problem or Opportunity?
• Only 30% of survey respondents consider big data a problem.
– Oddly enough, big data was a serious problem just a few years ago.
– Storage and CPUs developed greater capacity, speed, intelligence
• They also fell in price.
– New database mgt systems (DBMSs) arrived, designed for big data analytics.
• Data warehouse appliances and analytic DBMSs are relatively affordable.
• The vast majority (70%) considers big data an opportunity.
– The recent economic recession forced deep changes in most businesses.
– Big data analytics reveals change’s root cause, so you can stop or leverage it.
30%
70%
Problem – because it's hard to manage
from a technical viewpoint
Opportunity – because it yields detailed
analytics for business advantage
In your organization is big data considered mostly a problem or mostly an opportunity?
Source TDWI. Survey of 325 respondents, June 2011
Definition of Big Data Analytics • It’s where advanced analytic techniques operate on big data sets
• It’s about two things: big data AND advanced analytics
– The two have teamed up to leverage big data
– The combo turns big data into an opportunity
• Big Data isn’t new.
Advanced Analytics isn’t new.
– Their successful
combination is new
– Hundreds of terabytes of
data just for analytics is new
Big
Data
Ad
va
nc
ed
An
alytic
s
Big Data
Analytics
Big Data
Analytics
Opportunities for Big Data Analytics • Anything involving customers benefits from big data analytics
– better-targeted social-influencer marketing (61%)
– customer-base segmentation (41%)
– recognition of sales/market opportunities (38%)
• BI, in general, benefits from big data analytics
– more numerous and accurate business insights (45%)
– understanding business change (30%)
– better planning and forecasting (29%)
– identification of root causes of cost (29%)
• Specific analytics applications are likely beneficiaries
– detection of fraud (33%)
– quantification of risks (30%)
– market sentiment trending (30%)
Source TDWI. Survey of 325 respondents, June 2011
Traditional Big Data versus New Generation Big Data
Traditional Big Data New Generation Big Data
Problem Opportunity
Hoarded Leveraged
Tens of Terabytes,
sometimes more
Hundreds of Terabytes,
soon to be measured in Petabytes
Mostly structured and relational data Mixture of structured, semi- & unstructured
Data mostly from traditional enterprise
applications: ERP, CRM, etc.
Also from Web logs, clickstreams, e-commerce,
sensors, mobile devices
Common in large companies:
Mainstream today
Common in Internet companies:
Will eventually go mainstream
Real-time for Operational BI, etc. Real-time for Streaming Data
Traditional DWs versus New Generation Analytic DBs
Traditional Data Warehouse New Generation Analytic Database
Excels with Traditional Big Data Excels with New Generation Big Data
Killer platform for standard reports, dashboards,
performance mgt, OLAP
Killer platform for discovery oriented
advanced analytics
Mostly aggregates and calculated metrics in time-
series and multidimensional models
Mostly detailed source data
All the best practices of data management apply,
including DI, DQ, MDM
Best practices of data mgt are suspended to preserve
data nuggets for discovery
Well-understood data for reporting and OLAP Less understood data for discovery analytics
Single version of the truth for facts that must be
tracked over time
Excellent source for deducing unknown facts and
relationships
Well-organized history of corporate performance Recent information, for studying change
The Location of Big Data and Analytic Workloads
affect Your DW Architecture and Design
• Where do you store and manage big data for analytics?
• Where do you process big data with analytic tools? – In the data warehouse proper?
– In a standalone edge system that integrates with the DW?
– Both?
• These are critical design and architectural decisions that adopters of big data and advanced analytics must make.
EDW
Federated
Data
Marts
Real
Time
ODS
Customer
Mart or
ODS
No-SQL
Database
Hadoop
Distributed
File Sys
Data
Staging
Area
Metrics for
Performance
Mgt
OLAP
Cubes
Multi-
dimensional
Data Models
Detailed
Source
Data
Analytic
Sand
Box
DW
Appliance
Columnar
DBMS
Map
Reduce
Data
Mining
Cache
Star or
Snowflake
Scheme
Analytic Workloads affect Your DW Architecture
• Monolithic DW Architecture – A single DBMS instance (or grid) hosts multiple database
workloads & datasets • Requires hefty DW platform to handle multiple, diverse workloads
& data types
• Distributed DW Architecture – Users deploy multiple platforms, so each is optimized for a
specific workload type • If not controlled, data marts and ODSs may proliferate. Complexity
is a problem, due to the large number of “edge systems.”
• Hybrid DW Architecture – A monolith manages all core reporting/OLAP data & most
workloads
– But a few workloads are deployed on separate systems
• Data staging areas evolved to do more than stage data – Now they must evolve again to accommodate big data
• Originally data staging areas were temporary holding bins – In that spirit, some are good for “analytic sandboxes”
• Most data staging areas are optimized for detailed source data – Can manage detailed source data as found in transient big data
• Data is regularly processed while managed in the staging area – E.g., sort prior to a DW load. SQL temp tables held in staging for later merging
– Some analytic workloads (especially columnar) may run well in a staging area
• Data staging must scale to big data’s volume, which comes and goes – A cloud could be an elastic platform for unpredictable data staging volumes
Big Data even affects
Your DW’s Data Staging Area
Big
Data
Data
Staging
DW
Analytic Tools for Big Data affect Your DW • There’s a cross-road where you choose an
analytic method – or multiple methods!
1. Online Analytic Processing (OLAP)
2. Extreme SQL
3. Statistical Analysis
4. Data Mining
5. Other: Natural Language Processing (NLP), Artificial Intelligence (AI)
• Each analytic method has requirements for data and analytic tool types.
– Multiple analytic methods can lead to multiple data stores, DBMSs, DW arch. components – and multiple analytic tools
Big Data Analytics affects Data Mgt for Your DW • Analyze data first
– Later, improve it for a more polished analysis
• Analytic discovery depends on data nuggets
– Both query-based and predictive analytics need:
• Big data, raw data
• Data quality for analytic databases
– Do discovery work before addressing data anomalies and standardization
• E.g., fraud is often revealed via non-standard or outlier data
• Data modeling for analytic databases
– Modeling data can speed up queries and enable multidimensional views
• But it loses details & limits queries
• Do only what’s required, like flattening and binning
• Data for post-analysis use in BI
– Apply best practices of DI, DQ, modeling
01101
00100
10110
10010
10100
10011
01001
Use Cases
for Your DW
and Big Data
Analytics
• Big Data enables exploratory analytics. Discover new: – Customer base segments
– Customer behaviors and their meaning
– Forms of churn and their root causes
– Relationships among customers and products
• Analyze big data you’ve hoarded. Finally understand: – Web site visitor behavior
– Product quality based on robotic data from manufacturing
– Product movement via RFID in retail
• Use tools that handle human language for visibility into: – Claims process in insurance
– Medical records in healthcare
– Call center applications in any industry
• Big data improves data samples for older analytic apps: – Fraud detection
– Risk management
– Actuarial calculations
– Anything involving statistics or data mining
• Big data can add more granular detail to analytic datasets: – Broaden 360-degree views of customers and other
entities, from hundreds of attributes to thousands
CONCLUSIONS – Plan ahead, because:
Big Data affects Your Data Warehouse • Scaling up or out to big data volumes
– Affects choice of computing architectures, platforms, hardware, DBMSs…
• Incorporating new data types and new data sources – Semi- and un-structured data. Web, sensor, and social data
• Operating in real-time and streaming big data – Real-time is a biz requirement. Many new sources stream big data.
• Supporting multiple database workloads in single DW environment – Big data analytics introduces new analytic workloads & affects architecture
• Adjusting best practices in data management – These still apply to big data analytics, but in a different order & priority
• Updating or replacing your DW models or architecture – To accommodate all the above…
Big Data Analysis: Airline Industry
What do they think about me?
Data Sources
Customer records Social Media Email Phone Logs Weblogs
Apache Hadoop
Distributed file system Map/Reduce programming
Oracle In-Database Analytics
2 miles
Statistical Analysis
Data Mining
Text
Graph
Spatial
Semantic
Oracle R Enterprise Oracle Advanced Analytics
In-database Large data sets Same code
Exadata Exalytics
Oracle Big Data Platform
ACQUIRE ORGANIZE DECIDE ANALYZE
Big Data
Appliance
Oracle Big Data Appliance
• Optimized and Complete
• Integrated with Oracle Exadata
• Easy to Deploy
• Single Vendor Support
25
Questions?
Contacting Speakers
• If you have further questions or comments:
Philip Russom
prussom@tdwi.org
Peter Jeffcock
peter.jeffcock@oracle.com
top related