![Page 1: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/1.jpg)
The Reality of Real-Time Business Intelligence
Divy Agrawal
Computer Science
UC Santa Barbara
![Page 2: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/2.jpg)
The beginning
![Page 3: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/3.jpg)
50 Years of Business Intelligence
Vision of Business Intelligence:
Hans Peter Luhn in a 1958 article.
Predates the notions of Databases and Data
Management.
A pioneer in Information Sciences:
New use of the term thesaurus
Automatic creation of literature abstracts
16 digit Luhn’s number widely used for credit
cards and other banking instruments
…
![Page 4: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/4.jpg)
Luhn’s Vision
Defined BI as:“… provides means for selective dissemination to each of its action points in
accordance with their current requirements or desires.”
Key technologies: Auto-abstracting of documents,
Auto-encoding of documents, and
Auto creation and updating of profiles
Breadth of the vision:“… business is a collection of activities carried on … be it science,technology, commerce, industry, law, government, defense, et cetra.”
“… intelligence is also defined … as the ability to apprehend theinterrelationships of presented facts in such a way as to guide actiontowards a desired goal.”
![Page 5: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/5.jpg)
The intervening years
![Page 6: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/6.jpg)
The Early Years (1970s-1980s)
Contrary to Luhn’s overarching vision – early efforts on business information remained focused on database management technology.
With the advent of the relational model: DBMS technology became pervasive and matured.
Widely adapted by most enterprises.
Online Transaction Processing became a proven paradigm for business operations.
Consequence: Massive proliferation of OLTP systems especially within a
single enterprise.
Data-driven decision making became a norm.
Disparate reporting from multiple operational data sources.
![Page 7: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/7.jpg)
Notion of “Data Warehousing” (1990s)
Presence of multiple operational systems created a fracturedview of an enterprise.
Devlin & Murphy introduced the term business data warehouse in 1988: A unified view of the enterprise primarily for integrated reporting.
Catalysts: Demand for reporting – key factors being PCs and spread-sheets.
Market potential – Teradata, Red-brick Systems, etc.
Negative factors: Unproven, immature, and expensive technology proposition.
Distinction between DBMS and DW: no clarity, ?duplication?
Fairly laborious and time-consuming data integration process
No clear stake-holders 2nd Class Entity often resulting in adversarial atmosphere.
![Page 8: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/8.jpg)
Data Warehousing: Current State
Keys to success:
Enormous contribution of DW evangelist Ralph Kimball
STAR schema & Dimensional model for DW: intuitive and scalable
No compromise on the autonomy of operational data sources
Persisting head-winds:
Since does not directly contribute to P&L:
ROI question still persists.
Not a plug & play technology:
Very high consulting costs.
Legacy of significant time and cost over-runs of most data warehousing projects.
Batch-oriented DW Architecture:
Deemed too costly just for integrated reporting.
Needed intuitive analytical capabilities.
![Page 9: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/9.jpg)
Hither “Business Intelligence” (2000-)
Gray et al. [1996] introduced the CUBE operator for roll-up and drill-down analysis of multi-dimensional data (i.e., DW Model).
DW enterprises (Hyperion, Cognos, Analysis Services, etc.) adapted the CUBE architecture and called it: business intelligence.
Problem: Early BI (CUBE) technology had serious issues of scaling
only accentuated the ill-repute of DW/BI technologies
Underlying problem: exponential explosion of data storage
![Page 10: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/10.jpg)
Business Intelligence: Current State
While the BI/Cube technology was still evolving – the spin doctors needed to undo the early damage.
Hence, perhaps the term Real-time Business
Intelligence – to convey the “criticality” of such technology to business leaders.
Current debate: what exactly is meant by “real-time” in Business Intelligence?
In 2006, in this workshop, Donovan Schneider – gave numerous examples of “degree of timeliness” for a variety of analysis tasks.
My personal view is that the correct term should have been: Online Business Intelligence.
Assuming that – redefine the DW/BI architecture to support RTBI.
![Page 11: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/11.jpg)
The present & the future
![Page 12: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/12.jpg)
Real-time Business Intelligence:
Required?
Anecdotal evidence from Sam Walton
Airplane & Parking Lot Story
• Demonstrates the power of 10,000 feet view (from the airplane) versus the local view (from the parking lot).
• Numerous cases where “timeliness” of “intelligence” is extremely valuable.
The case of RTBI is very-well justified. The question however is at what cost?
![Page 13: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/13.jpg)
BI/DW Architecture (Revisited)
DS1 DSj DSn
Data Updates
Data Warehouse
(batched updates)
Data Cube
(batched updates)
Data Mining
Re
al-tim
e B
I?
![Page 14: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/14.jpg)
Underlying Technology Components
System Technology Components
Database
Management
Systems
Relational Model
Declarative Language
Data Independence
Data Warehouse Dimensional Model
Design Methodology
ETL Tools
Business Intelligence Data Cube Model
Real-time Business
Intelligence
![Page 15: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/15.jpg)
Real-time in BI/DW Architecture?
DS1 DSj DSn
Stream Analysis Engine
Aggregate LevelOutliers/Alerts
RulesEngine
Operational LevelOutliers/Alerts
Event Streams
Data Warehouse
(online updates)
Data Cube
(online updates)
Data Mining
![Page 16: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/16.jpg)
Real-Time ETL: Surrogate Key,
Duplicate Elimination (R&D efforts)
Sources DWETL
id descr
10
20
coke
pepsi
R1
id descr
10
20
pepsi
fanta
R2
id source
10
20
10
20
R1
R1
R2
R2
Lookup
skey
100
110
110
120
id descr
100
110
120
coke
pepsi
fanta
RDW
![Page 17: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/17.jpg)
Mesh-Join [Polyzotis et al.]
s1
Stream S
Join module
Relation R
p1
p1
s1
t = 0
p2
Stream S
Join module
Relation R
p1
p2
s2
t = 1
p2
already
joined
with p1
Stream S
Join module
Relation R
p1
p1
s3
t = 2
p2
s2s1
already
joined
with p2
scan
resumes
Vassiliadis & Simitsis: Near Real-time ETL (forthcoming)
Real-time Scheduling of Updates: on-going work
![Page 18: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/18.jpg)
Enabling Real-time BI: Source Updates
Online updates: Move from periodic refresh to continuous updates
Example: The window of opportunity for up-sell/cross-sell of a product is while the customer is still around NOT AFTER he/she has left.
Tighter coupling between the operational data sources to the data warehouse: In the past, operations team viewed the DW/BI as a
necessary evil
In the current business landscape, DW/BI should be viewed as a means to survival
![Page 19: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/19.jpg)
Enabling Real-time BI: Data Streams
Stream Analysis & Management:
Event monitoring before updates incorporated in
the warehouse.
Stream operators:
Heavy-hitters (frequency counting)
Fraud detection
Performance monitoring
Histograms and quantile summaries
…
Outlier detection for operational intelligence
Summary/Aggregate analysis for strategic
decision-making.
![Page 20: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/20.jpg)
Enabling Real-time BI: Data Integration
Automated Data Integration:
Current approaches of integrating data from operational data
sources into the data warehouse too tedious and time
consuming.
Although this task is greatly simplified with the plethora of
ETL tools that are available in the marketplace (e.g.,
Informatica)
New research for automated schema integration (e.g., Pay-
as-you-go Data Spaces model)
Problem: uncertainty of data integration.
A monumental challenge especially since the
enterprises of today are highly dynamic and are
constantly evolving.
![Page 21: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/21.jpg)
Enabling Real-time BI: Analysis
Language
Declarative approach for analytical processing: Current approach of analytical processing is ad-
hoc and error-prone.
Translating business questions into analysis queries is highly manual.
Newer approaches are emerging: MapReduce from Google significantly simplifies Web Log
analysis.
Yahoo’s PigLatin project
Microsoft’s DRYAD project
Need similar efforts for other types of analysis and mining tasks (MDX?).
![Page 22: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/22.jpg)
Enabling Real-time BI: Scaling with
Large Data Volumes
Scalability:
Certain queries (Temporal and Spatial
Correlations) are bound to access huge amounts
of data.
Need to rely on hardware solutions to provide
scalability.
Emerging solutions (Parallel DBMS Technology):
GreenPlum
HP’s NeoView
Google’s GoogleFS and BigTable
Yahoo endorsed Hadoop
Cloud Computing?
![Page 23: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/23.jpg)
RTBI: Technology Components
System Technology Components
Database
Management
Systems
Relational Model
Declarative Language
Data Independence
Data Warehouse Dimensional Model
Design Methodology
ETL Tools
Business Intelligence Data Cube Model
Real-time Business
Intelligence
Online updates
Stream Operators & Events
Next-gen MDX
Parallel Query Processing
Au
tom
ate
d D
ata
In
teg
rati
on
![Page 24: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/24.jpg)
Concluding Remarks
Real-time BI (equivalently Online BI) has the immense potential for:
Data-driven operational decision making.
Data-driven feedback towards business strategy.
Current adaptation of Real-time BI is hampered because of:
Lack of clarity about the underlying technology components
Significant costs associated with custom solutions
Our task:
To clearly define the overall architecture of the next-generation Real-time BI Systems
Design and develop the necessary technology components.
Realize economies-of-scale to bring the cost factors down for a wide-scale adaptation
![Page 25: The Reality of Real-Time Business Intelligence · Notion of “Data Warehousing” (1990s) Presence of multiple operational systems created a fractured view of an enterprise. Devlin](https://reader034.vdocument.in/reader034/viewer/2022042420/5f37c756ededcb574b66f33f/html5/thumbnails/25.jpg)
Hans Peter Luhn