big data for oracle architects
TRANSCRIPT
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 1/25
nOracle White Paper in Enterprise rchitecture
August
2 2
Oracle Information Architecture:
n rchitect sGuide to Big Data
OR Le
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 2/25
A—m
m
Tm
am
f
tm
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 3/25
An
Oracle
White
Paper
in
Enterprise
Architecture—informationArchitecture: An
Architect s
Guide to Big Data
What Makes Big Data Different? 4
Comparing Information Architecture Operational Paradigms 6
ig
Data Architecture Capabilities 7
Storage and Management Capability 7
Database Capability 8
Processing Capability 9
Data Integration Capability 9
Statistical Analysis Capability 9
ig Data Architecture 1
Traditional Information Architecture Capabilities 1
Adding
ig
Data Capabilities 1
An Integrated Information Architecture 11
Making
ig Data Architecture Decisions 13
KeyDrivers to Consider 13
Architecture Patterns
in
Three
Use Cases 14
ig Data Best Practices 2
1: lign
ig
Data with Specific Business Goals 2
2: Ease Skills Shortage with Standards and
Governance
2
3: Optimize Knowledge Transfer witha Center of Excellence 21
4: Top Payoff is ligning Unstructured with Structured Data 21
5 :
Plan Your Sandbox For Performance 22
6:
lign
with the Cloud Operating Model 22
Summary 22
Enterprise Architecture and Oracle 23
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 4/25
AnOracle WhitePaper inErrterpriseArchitecture—information Architecture: AnArchitect s Guide to BigData
Int roduct ion
Ifyour organization is like many, you re capturing and
sharing more
data from
more
sources than ever before. As a result, you re facing the challenge of managing high-
volume and
high-velocity
data
streams quickly
and
analytically.
Big Data is all about finding a needle of value in a haystack of unstructured information.
Companies are now investing in solutions that interpret consumer behavior, detect fraud,
and
even
predict
the
future McKinsey
released
a report in May 2011 stating that leading
companies are using big data analytics to gain competitive advantage. They predict a
60 margin increase for retail companies who
ar e
able to harvest the
power
of big data.
To support
these
new
analytics, IT
strategies
are
mushrooming,
the newest techniques
include brute force
assaults
on massive information
sources,
and filtering
data
through
specialized parallel processing
and
indexing
mechanisms.
The results
ar e
correlated
across
time
and
meaning, and often merged with traditional corporate
data sources.
New
data
discovery
techniques
include
spectacular
visualization tools
and
interactive
semantic
query experiences. Knowledge workers and
data
scientists sift through filtered
data
asking
on e unrelated explorative
question after another.
As these supporting
technologies emerge
from
graduate research
programs into
the
world of corporate IT, IT
strategists, planners, and architects need to both understand them
and
ensure
that
they
are
enterprise
grade.
Planning a Big Data architecture is not about understanding just what is different. Ifs
also about
how to integrate
what s new
to
what
you already
have
- from database-and-BI
infrastructure to ITtools,
and end user
applications.
Oracle s
own product
announcements
in hardware, software,
and
new partnerships have
been
designed to
change
the
economics around Big Data investments
and the
accessibility of solutions.
The real industry challenge is not to think of Big Data as a specialized
science
project,
but rather
integrate it into
mainstream
IT.
Inthis paper, we w discuss adding Big Data capabilities to your overall information
architecture, planning for adoption using
an
enterprise architecture perspective,
and
describe
some key use cases .
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 5/25
An Oracle White Paper In Enterprise Architecture—InformationArchitecture: An Arch i t ect s Gu i d e to B i g D a ta
Wha t Makes
Big
Data
Different?
Bigdata refers to largedatasets that are challenging to store, search, share,\dsuali2e, and analyze.
At first glance, the orders of magnitudeoutstrip conventionaldata processingand the largestof
data warehouses. For example, an
airline
jet collects 10 terabytes of sensor data for every 30
minutes of flying time. Compare that with conventional high performance computing where
New York Stock Exchange collects 1 terabyteof structured trading data per day. Compare again
to a conventional strucmred corporate data warehouse that is sized in terabytes and petabytes.
BigDatais sized in peta-,
exa-,
and soon perhaps,zetta-bytes And,it s nor justaboutvolume,
the approach to analysis
contends
withdata contentand structure that cannotbe anticipated or
predicted. Tliese analytics andthe science behind themfilter low
value
or
low-density
data to
reveal liigh value or iiigh-densit}
data.
As a
result,
newandoften proprietar) analjTical
techniques are required. BigData has a broad array of interesting architecture challenges.
It is often said that data volume, velocit , and variet define BigData, but the unique
characteristic of
Big
Data is the mannerin
which
tlie
value
is discovered. BigData is
unlike
conventional
business
intelligence, where the
simple
summing of a known value reveals a result,
such as order
sales
becoming year-to-date sales. With Big Data, thevalue is discovered through a
refining modeling process: make
a
h)y>othesis, create statistical,
visual, or
semantic
models,
validate,
then make a new hypothesis. It either takes a person interpreting \dsualizations or
making interactive knowledge-based queries, or bydeveloping
machine
learning
adaptive
algorithms drat
can
discover meaning.
And in die end, the
algoritlim may
be
short-lived.
The growth of bigdatais a
result
of the increasing channels and
variet)
of data in
today s world.
Some
of
the new data sourcesare user-generated content through sodal media,web and software
logs,
cameras,
information-sensing mobile devices,
aerial
sensoiy
technologies, genomics, and
medical records
Companies
have realized diat
there
is
competitive
advantage in this
information
and thatnow
the timeto put this data to work.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 6/25
AnOracle White Paper InEnterprise ArchHecture—InformationArchitecture: AnArchitect s Guide to BigData
A Big
Data Use
Case Personalized Insurance Premiums
To begin, let s consider the business opportunit} tliat Big Data brings to a conventional business
process in the insuranceindustry—calculating a competitive and profitableinsurance premium.
In an effort to be morecompetitive, aninsurance company wants to offer their customers the
lowest possible premium, but only to those who are unlikely to make a claim, thereby optimizing
their profits, One way to approach tliis problem is to collectmore detailed data about an
individual s drivinghabits and then assess tlieir risk.
In
fact,
insurance companies arenowstarting to collect data on driving habits utilizing sensors in
theircustomers cars. These sensors capture dri\hng data,suchas routesdriven, miles driven,
time
of
day,and braking abruptness. This data is used to assess driver risk; they compare
individual drivingpatterns with other statistical information, such as average
miles
driven in your
state, and peak hours of drivers on the road. Driver risk plus actuarial information is then
correlatedwith
policy
and profileinformation to offer a competitive and more profitable rate for
the company. The result? A personalized insurance plan. These unique capabilities, delivered
from big data
anal}tics,
are revolutionizing the insuranceindustiy.
To accomplish this task, a great amount of continuous data must be collected, stored, and
correlated. Hadoop is an excellent choice for acquisition and reduction
of
the automobilesensor
data. Master data and certain reference data includingcustomer profile information are
likely
to
be stored in the existing DBMS systems, and a NoSQL databasecan be used to capture and store
reference data that aremore dynamic,
diverse
in
formats,
and
change
frequently. Project R and
Oracle R Enterprise are appropriate choices for analyzing bodi private insurancecompanydata
as wellas data captured from publicsources. And
finally,
loading the MapReduce results into an
existingBI environmentallows for further data explorationand data correlation. With these new
tools,
the
company
isnow able to
addresses
the
storage, retrieval, modeling,
and processing side
of
the requirements.
In thiscase, the traditional business process and supporting data master, transaction, analytic
data are able to add statistically relevantinformation to their profit model and deliveran industry
innovative result
Ever industr haspertinent examples. Bigdata sources take a variet
of
forms: logs,sensors,
text, spatial, and
more.
As IT strategists, planners, and architects, weknow that our
businesses
havebeen trying to find the relevant information in all thisunstructured data foryears. But the
economic choices aroundgetting to theserequirements have been a barrier untilrecendy.
So,whatis your approach to offer your lineof business a set ofrefined data services that can
help them not just
find
data,
but improve and innovate theircorebusiness processes? What s
die impact to yourinformation architecture? Yourorganization? Your analytical skills?
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 7/25
An Oracle White
Paper
In Enterprise
Architecture—Information Architecture:
An Architect s
Guide
to Big Data
Comparing
Information Architecture Operational
Paradigms
Bigdata differs from other data realmsin manydimensions. In tiic following table you can
compare and contrastthe characteristics of bigdataalongside dieother data
realms
described in
Oracle s
Information Architecmre
Framework fOlAF).
Data Rea lm St ruc tu re Vo lume
Master D ata Structured Low
Transact ion
Data
Reference
Data
Metadata
Analytical
Data
Document s
and
Conten t
Big
Data
Structured
structured
Stnjctured
structured
Medium
-high
L o w
Medium
Structured
Low
Structured
Unstructured
Structured,
s emi
structured,
uns t ructured
Medium-
High
Medium
- High
High
Description
Enterprise-level
data
entities that
ar e of
strategic value
to
an
organization. Typically non-volatile
and non-t ransact ional In nature.
Business
t ransac t ions tha t
a re
captured
during business
operations and
processes
Internally managed or externally
sourced facts to
support
an
organization s ability to effectively
process transactions,
manage
master
data,
and provide decision
support capabilities.
Defined as data abou t t he
data.
Used as
an
abstraction
layer for
standardized
descriptions
and
operations, E.g. integration,
intelligence, services.
Derivations
of
t he bu s in e s s
operation and transaction data
used
to satisfy reporting and
analytical
needs.
Documents, digital images, geo-
spatial data, and multi-media files.
Large
datasets that ar e
chal lenging to
store,
search,
share, visualize, and
analyze.
Table 1: Data Realms Definitions (Oracle Information Architecture Framework)
Examples
Customer,
product, supplier,
an d locat ion s i t e
Purchase records,
inquiries,
and
payments
Geo
d at a a nd market d a t a
Data name , data dimensions
or units,
definition
of a data
entity, or a calculation
formula of
metrics.
Data
tha t
res ide in data
warehouses , da ta marts,
an d
other decision support
applications.
Claim forms, medical
images, maps, video
files.
Use r
an d mach i ne
generated
content through
social media, web and
software logs,
cameras,
information-sensing
mobile devices, aerial
sensory
technologies,
and
genomlcs.
These different characteristics have influenced how we capture, store, process, retrieve, and
secure
our
information arcliitectures. As we evolve into Big Data, you can minimize your
arcliitecture riskby
finding
synergies across your investments
allowing
youto leverage your
specialized
organizations and theirskills equipment, standards, and governanceprocesses
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 8/25
AnOracleWhite Paper InEnterprise Architecture—Information Architecture: An Architect s Guide to Big Data
Here are some capabilitiesthat you can leverage:
Data
Realms Security
Storage
Retr ieva l
Master
data
Database RDBMS/
Transactions app SQL
Analytical data user
Metada ta access
Modeling Processing
Integration
Pre defined
ETL/ELT
CDC
relational or Replication
dimensional Message
modeling
Reference data
Platform XML / xQuery Flexible ETL ELT
security
Extensible
Message
ocumen t s
and
onten t
Big Data
- Weblogs
• Senso r s
• Socia l Media
File
system
bas ed
Tiie
system
database
File System / Free Form
OS ievei
file
Sea rch movemen t
Distributed Flexible Hadoop
FS / noSQL (Key Value)
MapReduce
ETL/ELT,
Message
Consumption
Bi
Statistical
Tools Operational
Applications
System based
data
consumption
Content
Mgmt
Bi
Statist ical
Too ls
Table 2: Data Realm Characteristics (Oracle information Architecture Framework]
Big Data Architecture Capabilities
Here is a brief outline of BigData capabilities and their primarj technologies:
Storage and Management Capability
Hadoop Distributed File System(HDFS):
• AnApache open sourcedistributed file system http://hadoop.apache.org
• Expected to run on high-performance commodityhardware
• Known for highly scalable storage and automaticdata replication across three nodes for
fault
tolerance
• Automatic data replication across three nodes eliminatesneed for backup
• Write once, readmany times
Cloudera Manager:
• Cloudera Manager is an end-to-end management application for Cloudera s Distribution of
ApacheHadoop, http://www.cloudera.com
• Cloudera Managergives a cluster-wide, real-time view
of
nodes and servicesrunning;
providesa single central place to enact configurationchangesacross the cluster;and
incorporates a
full
rangeof reportingand diagnostic tools to help optimizecluster
performance and utilization.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 9/25
An
Oracl e
White
P a p e r
in
Enterprise Architecture—information Architecture
An
Arch i tect s Gu i d e
t o Bi g
Data
Database
Capability
Oracle NoSOL: Click for more information)
• Dynamic and flexible schema design. High performance key value pair
database.
Keyvalue
pairis an
alternative
to a pre-defined
schema. Used
fox
non-predictive and
dynamic data.
• Able to
efficiently
process datawithouta row and column structure.
Major
+ Minorkey
paradigm allows multiple recordreadsin a
single
API
call
•
Highly scalable
multi-node,
multiple
data center,
fault
tolerant, ACID operations
• Simple programming model, random index reads and
writes
• Not
Only
SQL. Simple pattern queries andcustom-developed solutions to access data
such
asJava APIs.
Apache
HBase: Click for
more
information)
• Allows random, real time read/write access
•
Strictly
consistent readsand
writes
• Automaticand configurable shardingof tables
• Automatic
failover
support between
Region
Servers
Apache Cassandra:
Click for more information)
• Data model offers column indexeswith the performance of log-structured updates,
materializedviews, and built-in caching
• Fault
tolerance
capabilitj isdesigned forever) node, replicating across
multiple
datacenters
• Can choosebetween synchronous or asynchronous replication for eachupdate
Apache
Hive:
Click for more information)
• Tools to
enable
easy dataextract/transform/load ETL) from files stored either
direcdy
in
Apache
HDFS or inotlier
data
storage
systems such
as
Apache HBase
•
Uses
a simple
SQL-like query language called
HiveQL
• Quer) executionvia MapReduce
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 10/25
AnOracle WhitePaper InEnterprise Architecture—information Architecture: An Architect s Quide to BigData
V
Processing Capability
MapReduce:
• Defined
by Google in
2004 Click here for original
paper|^
^
• Break problem up into smaller sub-problems
• Able
to
dist r ibu te da ta work loads ac ross t l iousands nodes
• Can be ex po sed via SQL and in SQL-b ased BI tools
Apache Hadoop:
• Leading MapReduceimplementation
• Highly
scalable
parallel batch processing
• Highlycustomizable infrastructure
• Writes multiple copies across cluster for fault tolerance
Data Integration Capability
Oracle Big Data Connectors. Oracle Loader for Hadoop. Oracle Data Integrator:
Click here for Oracle Data Integration and BigData^
• Exports MapReduce resiolts to RDBMS, Hadoop, and otlier targets
• Connects Hadoop to relational databases for SQL processing
• Includes agrapliical user interface integration designer thatgenerates
Hive
scripts to move
and transform MapReduce results
• Optimized processingwith parallel data import/export
• Can be installed on OracleBigData Appliance or on a generic Hadoop cluster
Statistical Analysis Capability
Open Source Project R and Oracle R
EnteqDrise:
• Programming language for statisticalanalysis Clickhere for Project R)
• Introduced into Oracle Database as a SQL extension to perform high performance in-
database statisticalanalysis Clickhere for Oracle R Enterprise^
• OracleR Enterprise allows reuseof pre existing R scripts with no modification
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 11/25
An
Oracle
White
Paper
In
Enterprise Architecture—Information
Architecture: An
Ar chitect s Guide
to Big Data
Big
Data Architecture
In thissection, wewilltake a closer look at the overall architecture for big data.
Traditional Information
Architecture Capabilities
To understand the liigh-level architecture aspects of BigData, let's first reviewa well formed
logical
information arcliitecrure for structured
data.
In the illustration, youseetwo datasources
that useintegration(ELT/ETL/Change Data Capture) techniques to transfer data into a DBMS
datawarehouseor operationaldata store,and then offer a widevarietj- of analytical capabilities to
reveal
the
data.
Someof these
analytic capabilities include:
dashboards, reporting, EPM/BI
applications,
summary^
and
statistical
query, semantic interpretations for textual
data,
and
visualization tools for liigh-density data. In addition,someorganizations haveapplied oversight
and standardization across projects, and perhaps havematured the informationarcliitecture
capability
through managingit at the enterprise level.
«
R e f e r e n c e
M a s t e r Dat a
Transaction
D a t a
Enterprise
integration
y.
Analytic
Capabilities
01
c . u
o> >> c
E €
4) 3 £
D> o 0)
5 >
£ w o
n o
Figure 1: Traditional informationArchitectureCapabilities
Tlie key information
architecture
principles
include
treating dataasanasset through a value, cost,
and risklens,and ensuring timeliness, quality, and accuracy of data. And, tlieEAoversight
responsibility
isto
establish
and
maintain
a
balanced governance
approach
including using center
of excellence for standardsmanagement and training.
Adding Big
Data
Capabilities
Tlie defining processing capabilities forbigdata arcliitecture areto meet the
volume, velocity,
variety,
and value requirements.
Unique
distributed
(multi-node)
parallel processing arcliitectures
havebeen
created
to parse these
large
data
sets,
lliere are
differing technology strategies
for
real-time
and batch processing requirements. For
real-time,
key-value data stores, suchas
NoSQL,
allow for liigh performance, index-based
retrieval.
For
batch
processing, a
technique
known
as Map
Reduce, filters
data
according
to a
specific
data
discovery
strategy^
After the
filtered data is discovered, it canbe analyzed directly, loaded into other unstructured databases,
sent to
mobile
de\ ices, or merged into traditional datawarehousing enidronment and correlated
t o s t r u c mr e d
data.
M a c h i n e
Genera ted
Text, Image,
Audio
Video
y.
Distributed
File
System
Key Value
D a t a S t o r e
y
Figure2; BigData InformationArchitecture Capabilities
Map Reduce
y
Sandboxes
Data
W a r e h o u s e
y.
Analytic
Capabilities
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 12/25
AnOracle White
Paper
InEnterprise Architecture—information Architecture; An Architect s Guide to Big Data
In additionto newunstructureddata realms there are two keydifferences for big data. First, due
to the sizeof the data sets, we don t move the raw data directly to a data warehouse. However,
after MapReduce processing wemay integrate the reduction result into the datawarehouse
environment so tliat we can leverageconventional BJ reporting, statistical, semantic, and
correlation capabilities. It is idealto have analjTic capabilities that combine a conventional BI
platform alongwith big data \isualization and quert capabilities. And second, to facilitate
analysis in the Hadoop environment, sandbox environments can be created.
For manyuse cases,big data needs to capture data that is continuously changing and
unpredictable. And to analyzetliat data, a new architecture is needed. In retail, a good example
is capturing real time foot trafficwith the intent of deliveringin-store promotion. To track the
effectiveness of floor displays andpromotions, customermovement and behaviormust be
interacdvely exploredwith visualization or querytools.
In other usecases the analysis cannot be complete untilyou correlate it withother enterprise
data - structureddata. In the example
of
consumer sentiment
analysis
capturinga positiveor
negative
social media
comment has some
value
but associating it witli yourmost or least
profitable customer makesit farmore valuable. So, tlie needed
capabilit}
widiBigData BI is
context and understanding. Using powerful statisticaland semantic tools allowyou to find die
needlein the haystack and
will
help youpredict the future.
In
summar}
the BigData architecture challenge is to meet the rapid use and rapid data
interpretation requirements whileat the sametime correlating it with other data.
W^iat s important is that the key information arcliitecturc principles aredie same,but the tactics
of
applying these principles differ. For example, how do welook at big data asan
asset.
Weall
agreethere s valuehidingwithinthe massive liigh-densit} data set.But how do weevaluateone
set of big data against the other?How doweprioritize? The keyis to think in termsof the end
goal. Focus on the business values and understand how criticalthey are in support of the
business decisions, as well as the potential
risks
of not knowing the liiddenpatterns.
Another
example
of
applying
architecture principles differently is
data
governance. The quality
and
accuracy
requirements of bigdata can
vaty^
tremendously. Usingstrict data precisionrules
on user sentiment data might filter out too much useful information, whereas data standards and
common definitions are stillgoingto be critical for fraud detections scenarios.
To reiterate, it is important to leverage yourcore information arcliitecture principles and
practices, but apply them in a wat that s relevantto big data. In addition, the EA responsibility
remains
the
same
for big
data.
It is to
optimize success centralize
training and establish
standards
An
Integrated Information Architecture
One
of
the obstacles observedin Hadoop adoption in enterpriseis the lackof integrationwith
existing
BI
eco-system.
At present, the
traditional
BIand bigdata
ecosystems
are separate
causing
integrated dataanalysis
headaches.
Asa
result
they are not ready for usebytlietypical
business user
executive
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 13/25
An Oracle White Paper in Enterprise Architecture—Information Architecture: An Architect s Guide to Big Data
Earlier adoptersof bigdatahave often times written customcodeto movedie processed
results
of bigdatabackintodatabase or developed customsolutions to report and
analyze
on them.
Tliese
options
might
not be
feasible
and economical for enterprise IT. First of all, it
creates
proliferations of one-off
code
anddifferent standards. Architecture
impacts
IT
economics.
Big
Datadone independendy runs the riskof redundant investments. In addition, mostbusinesses
simply do not havethe staffandskill level for suchcustomdevelopment
work.
Abetter option is to incorporate theBig Data results into dieexisting Data Warehousing
platform. Tliepower of
information
lies inour abilit) to make associations andcorrelation.
What
weneedis the abilit) to bringdifferent datasources, processing requirements togetherfor
timely
and valuable analysis.
HereisOracle s holistic capability map that
bridges
traditional information architecture and big
data arcliitecture:
Specialized
Hardware
Acquire
Big
Data
Clu s t e r
ETL/ELT
ChangeDC
R ea T im e
Message-
Based
Hadoop
(MapReduce)
High
Speed
Network
Figure 3: Oracle Integrated InformationArchitectureCapabilities
Organize \ Analyze
In Da taba se
Analytics
RDBMS
Clu s t e r
Decide
Rcporting
Dashbo a rd s
Aler1ing
Re c o m m o n d a t i o n s
EPM, Bl.
Social
Applications
Text Analytics
a nd S e a rc h
dvanced
Analytics
Interact ive
Discovery
In Memory
Analytics [
Imi l I anGf
Asvarious data arecaptured, they canbe stored and processed in traditional
DBMS,
simple files,
or distributed-clustered systems suchasNoSQLand HadoopDistributedFile System HDFS),
Architecturally, the critical component tiiat breaksthe di\nde is the integrationlayerin the
middle. Tliisintegration layer
needs
to extendacross all of the datatypes and domains, and
bridge the gapbetween the traditional andnewdataacquisition and processing
framework.
The
data integration
capability needs
to cover theentirespectrumofvelocity and frequency. It needs
to handleextremeand ever-growingvolumerequirements. And it needs to bridge the variety of
data structures. Youneed to look fortechnologies that
allow
youto integrate Hadoop /
MapReduce with your data warehouse
and
transactional data
stores in a
bi-directional manner.
The nextlayer iswhereyou
load
the reduction-results fromBig Dataprocessing output into
your data warehouse for further analysis. You
also
need the
ability
to access your
structured
data
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 14/25
An Oracle White Paper in Enterprise Architecture—Information Architecture; An Architect s Guide to Big Data
such as customer profileinformationwhileyouprocess throughyour bigdata to look for
patterns such as detecting fraudulent activities.
The BigData processingoutput will be loadedinto tlie traditional ODS data warehouse and
data marts for fiarther
analysis
just as the transactiondata.The additionalcomponent in this
layeris the Complex Event Processingengine to
analy2e
stream data in real time.
The Business Intelligencelayerwillbe equipped with advanced analytics in-database statistical
analysis
and advancedvisualization on top of the traditionalcomponents such as reports
dashboards and queries.
Governance security and operationalmanagement alsocover the entirespectrumof data and
informationlandscape at the enterpriselevel
With this architecture the businessusers do not seea
divide
They don t evenneed to be made
awarethat there is a differencebetween traditional transaction data and BigData. The data and
analysis flowwouldbe seamless as theynavigate through variousdata and information sets test
hypothesis analyzepattems and make informed decisions.
Making Big Data Architecture Decisions
InformationArchitecture is perhaps the most complexarea of IT. It is the ultimate investment
payoff
Today s economicenvironmentdemands that business be drivenby
useful
accurate and
timely
information. And the world of BigData adds another dimension to the problem.
However there are
always
business and IT tradeoffs to get to data and information in a most
cost-effectiveway.
Key Drivers to
Consider
Hereis a summary of variousbusiness and IT drivers you need to considerwhen makingthese
architecture choices
BUSINESS DRIVERS
Better insight
Fas te r tum around
Accuracy
and timeliness
IT
DRIVERS
Reduce
storage cost
Reduce da ta
movemen t
Faster t ime to market
Standard ized toolse t
Ease of management and
operation
Security
and governance
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 15/25
An Oracle White Paper In Enterprise Architecture—Information
Architecture:
An Architect s Guide to Big
Data
Architecture
atterns in Th re e U se
ases
The businessdrivers forBigData processingand analytics are present in ever) industn and
will
soon be essentialin the deliver of new ser\ ices and in the analysis of a process. In some cases,
the) enable newopportunities withnewdata sources
tlaat
wehavebecome
familiar
with, suchas
tapping into the apps on our mobiledevices, location ser\ ices, socialnetworks,and internet
cotnmerce. Industr) nuances on devicegenerateddata canenable retnote patient monitoring,
personal fitness activit , driving behavior, location-based storedmovement, and predicting
consumer behavior. But, BigData also offers opportunities to refine conventional enterprise
business processes, suchas textand sentiment-based customerinteraction through
sales
and
»servicewebsitesand callcenter functions, human resource resume analysis, engineering change
management fromdefectthroughenhancement in product
lifecycle
management,
factor
automation and
qualit
management in manufacturingexecution
systems,
and
many more.
In
tliis
section, wewill explore threeuse
cases
and walk tlirough the arcliitecture
decisions
and
technology components. Case
Retail-weblog analysis. Case 2:Financial Services-real-time
t ransact ion detec tion. Case
3; Insurance-unstructured
an d
structured data
correlation.
Use
Case
1 :
Initial Data
Exploration
Tlie
first
example weare going to lookatis from the
retail
sector. One of nation s leading
retailers had disappointing results fromitsweb channels diuing theCliristmas season and is
looking to improve customers ex]>erience
witli
dieironline shop]5ing site. Some of the potential
areas to
investigate
include theweb
logs
andproduct/site
reviews
for shoppers. Itwould be
beneficial to understand the navigation pattern, especially relatedto abandoned shoppingcarts.
The
business needs to determine the value
of
these data versus the cost before making a major
investment .
This retailer s IT department
faces
challenges in a numberof
areas:
skill
set (or the
lack
thereof
for thesenew setsof dataand processing powerrequirements toprocess such large volume.
Traditional SQL toolsarepreferred choices forbusiness and IT,however, it isnot economically
feasible to load allthe data into a relational databasemanagementplatform.
HDFS
Figure 4: Use Case 1: InitialData Exploration
SQL
Too l s
14
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 16/25
AnOracleWhite
Paper
InEnterprise Architecture—InfomallonArchitecture: An Architect s Guide to Big Data
As die diagram aboveshows, tliisdesignpattern caUs formounting theHadoop Distributed File
Systemthrough a DBMS system so that traditional SQL tools can be used to explore the dataset
The
key benefits include:
• No data movement
• Leverageexisting investments and skills in database and SQL or BI tools
Oracle
Big Data
Appliance
Oracle
Big Data
Connecto rs
Oracle SQ L
Developer
any
SQL tools
OBJEE
B ig
DataAnalysis
Figures:
Use Case
1: Architecture Decisions
The diagram aboveshows the logical architectureusingOracleproducts to meet these criteria.
The
components
inibis
architecture:
• OracleBigData Appliance (orodier Hadoop Solutions):
o Poweredby the
full
distributionof Cloudera sDistribution including ApacheHadoop
(CDH) to store logs, reviews and other related big data
• Oracle BigData Connectors:
o Create optimized data sets for efficient loading and analysis in Oracle Database
llg
and
OracleEnterprise R
• Oracle Database
llg:
o ExternalTable:A feature inOracledatabase to present data stored in a
file
systemin a
table format and can be used in SQLqueries transparently.
• Traditional
SQL
Tools;
o OracleSQLDeveloper: Development toolwithgrapliic user interface tliat
allows
users
to access data stored in a relational database using SQL.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 17/25
An Oracle White Paper In Enterprise Architecture—(nformation Architecture: An Architect s Guide to Big Data
o BusinessIntelligence tools such asOBIEE can be alsoused to access data through
Oracle Database
In
summary,
the key architecture choice in tliis scenario isto avoid datamovement, minimize
processing requirement and investment, untilafter the
initial
investigation. Through this
architecture, the retailer mentioned above is able to access Hadoop data directly through
database and SQL interface for the initial exploration.
Use
Case
2 : Big Data
fo r
Complex Event
Processing
The second use case is relevant to financial servicessector. Large financialinstitutions playa
critical
role in detecting financial criminal and terroristactivity. However, their ability to
effectively meettheserequirements areaffected by tlieirIT departments
ability
tomeet the
followingchallenges:
• The expansion of anti-money laundering
laws
to
include
a growing numberof activities,
suchas gaming, organized crime, drug trafficking, and the financing of terrorism
• The ever
growing volume
of information that must be captured, stored,and
assessed
• The
challenge
of correlating datain disparate formats fromanmultitude of sources
TlieirIT systems needto provide abilities to automatically collect andprocess large volumes of
data froman
array
of sources
including
Currenq Transaction Reports
CTRs), Suspicious
Activit}
Reports
SARs), Negotiable Instrument Logs
NILs),
Internet-based acti\ it) and
transactions, and much
more.
Rea l
Time
NoSQL
at ch
HDFS
Streaming
(CEP
Engine)
Figure6: Use Case 2: BigData for Complex Event Processing
ler t s
PEL
Mobi le
Dashbo a r d
Database
16
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 18/25
AnOracle WhitePaper InEnterprise Architecture—Information Architecture; AnArchitect s Guide to 8lg Data
The idealscenariowould have been to include allhistoricprofilechanges and transactionrecords
to best determine tlie rate of risk for each of the accounts, customers, counterparties, and legal
entities,at various levels
of
a^regation and hierarchy. However,it was not traditionally possible
due to constraints in processingpower and cost of storage. With HDFS, it is now possible ta^
incorporate
all
the
detailed
data points to
calculate
such risk
profiles
and send to the CEP enginj|
to establish th e
basis
fo r die risk model
l^oSQL database in this
scenario
willcapture and store lowlatenq andlarge
volume
of data
fromvarious sources in a flexible datastructure, as wellas providereal-time dataintegrationwitL :
Complex Event Processengine to enable automaticalerts, dashboard, and trigger business
process to take appropriate actions.
I
Orac le
Big Data
i
Appliance
I
with
doop
;
and NoSQL
Streaming
CE P
Engine
Orac le
EDA/SOA Su it e
\u A
BAM
Dashboa rd s
BA M
ler t s
<
P EL
P r oc e s s e s
Rgure 7: Use Case U2: Architecture Decisions
The logical diagram above highlights the following main components of this architecture:
• Oracle BigData Appliance or other Hadoop Solutions):
o Powered by the fulldistribudon
of
Cloudera s Distribution including Apache Hadoop
CDH) to store logs,reviews, and other relatedb^ data
o NoSQL to capture low latenc) data with flexible data structure and fast querying
o MapReduce to process large amount of data for reduced and optimized dataset to be
loaded into database managementsystem
•
Oracle
EDA:
o Oracle CEP: Streaming complex event engine to continuously process incoming data,
analyze and evolve patterns, and raise events if suspicious activitiesare detected
17
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 19/25
An Oracle White Paper In Enterprise Architecture—information Architecture: An Architect s Guide to Big Data
o Oracle BPEL: BusinessProcess Execution Languageengine to define processes and
appropriate actionsbased on the event raised
o Oracle
BAM:
Real-time businessactivit) monitoringdashboards to provide immediate
insight and generate actions
In summar , the
key
principle of this architecture isto integrate bigdata with event
driven
architectureto meetcomplex regulator} requirements. Althoughdatabasemanagementsystems
arenot included in tliis architecture depiction, it isexpected that raised events and fiarther
processing
transactions
and records
will
be stored in the database
either
as
transactions
or for
future analytical requirements.
Us e Case 3: Big
Data
fo r Combined Analytics
Tlie third use caseis to continue our discussion
of
the insurance company mentioned in the
earlier section of thispaper. In a nutshell, tlieinsurance gianthas a need tocapture the
large
amount of sensor data that tracktheir customers drivinghabits, store diem in a cost effective
manner, process diis data to determine
trends
and identify
patterns,
and to
integrate
end
results
with existing transacrional, master, and reference data
they
are
already
capturing.
Haaoop
luttftfw
MapRt< iceiear
a a n a o r ,
raathar,
loca ion)
Big Data Cennactor
Adaancatf
AnalyUea
Rgure 8: Use
Case
3: Data FlowArchitecture Diagram
Haatar Dau
Rapoat t oi y
Cuat ot nar
Damographle)
Tranaacl lon
Syal ama
pramlum,
el ahna)
IS -ifi
u HMM
aPK
Data
Waraheuaa
Maits
Traditional Raporting an d Alart*
The
key
architecture
challenge of diis
arcliitecture
isto
integrate Big Data
with
structured
data.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 20/25
AnOracle White Paper InEnterprise
/^hliecture—Information
Architecture: An Architect s Guide to Big Data
The diagrambelowis a high-level conceptualviewthat reflects these requirements.
Rgure 9: Use Case 3: Conceptual Architecture for Combined Analytics
The large amount of sensor data needs to be transferred to and stored at the centralized
environment that provides flexible data structure, fastprocessing, aswellas
scalability
and
parallelism. MapReduce functionsare needed to process the low-density data to identify patterns
and trending insights. Tlieendresults need to be integrated into thedatabase management
system with structured data.
Oracle Big Data
Appliance
OptlmizedforHadoop
R. and
NoSQL
Processing
Orac le
Big
Data
onnec t o r s
imband
Orac le
xada ta
System of
Record
Optimized
for DW/OLTP
Orac le
Exalytics
Optimized for
Artalytlcs
Workload
Stream
|
cquire
|
Organize
|
nalyze
and
isualize
Figure10: Use Case 3: Physical Architecture for Combined Analytics
Leveraging Oracleengineered systems includingOracle Data Appliance, OracleExadata, and
Oracle Exalytics reducesimplementation
risks
provides fast time to valueand extreme
performanceand scalabilit} to meet a complexbusinessand IT challenge.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 21/25
An
Oracle
White
Paper
in
Enterprise
Architecture—information Architecture: An
rchitect s Guide
to Big Data
The keycomponents of this architecture include:
• OracleBigData Appliance (or other Hadoop Solutions):
o Poweredbythe
full
distributionof Cloudera s DistributionincludingApacheHadoop
(CDH) to store logs, reviews and other relatedbig data
o NoSQL to capturelowlatency datawith flexible data structure and fast
querying
o MapReduce to process large amount of data for reduced and optimizeddataset to be
loaded into database management system
• OracleBigData Connectors: Provides an adapter for Hadoop thatintegrates Hadoopand
OracleDatabase througheasyto usegraphical userinterface.
• OracleExadata: Engineered databasesystemthat supportsmixedworkloads for
outstanding
performance
of the
transactional
and/or datawarehousing environment for
further combined analytics.
• Oracle
Exalytics: Engineered
BI system that provides speed-of-thought analytical
capabilities to end users.
• Intiniband: ConnectionsbetweenOracleBigDataAppliance, OracleExadata, andOracle
Exal5mcs
arevia InfiniBand enabling high speed datatransfer forbatchor query
workloads.
Big
Data
Best Practices
Herearea
few
general
guidelines
to build a
successful
bigdataarchitecture
foundation:
1: Align BigData with Specific Business Goals
The key intentof Big Datais to find
hidden
value - value through intelligent
filtering
of
low
density and
high
volumes of
data.
Asan architect be prepared to advise yourbusiness onhow
to apply bigdata techniques to accomplish their goals. Forexample understand howto filter
weblogs to
understand eCommerce
behavior
derive sentiment
from
social
media andcustomer
support interactions understand statistical correlation
methods
and their relevance for customer
product manufacturing or engineering data. Even
though
Big
Dataisa newer IT frontier and
there is an obviousexcitementto master somethingnew,it is important to base newinvestments
in skills
organization
or infrastructure
with
a strong
business driven
context to guarantee
ongoing project investments
and funding.
To know ifyou
are
on the right track
ask
yourself
howdoesit supportand enable yourbusiness architecture and top IT priorities?
2:
Ease
Skills
Shortage
with
Standards and Governance
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 22/25
AnOracle White Paper InEnterprise Architecture—Information Architecture: AnArchitect's Guide to BigData
McKinsej' Global Institute^ wrote that one of the bluest obstacles for big data is a skills
shortage. With the accelerated adoption of deep analytical techniques, a 60 shortfall is
predictedby 2018. Youcanmitigate tliis riskby ensuringthat BigData technologies,
considerations, and decisions arc added to your IT governanceprogram. Standardizing your
approachwillallow you to manageyour costs and best leverage your resources. Another strateg}
to consider is to implementappliances tliatwould provideyouwith a jumpstartand quickertime
to value asyou grow your in-house expertise.
3: Optimize Knowledge Transfer with a
Center
of
Excellence
Use a center of excellence (CoE) to share solutionknowledge, planningartifacts, oversight,and
managementcommunications for projects. Wlietherbigdata is a newor expandinginvestment,
tlie soft and hard costs can be an investment sharedacross the enterprise. Another benefit from
the CoE approachis diat it
will
continue to drive the bigdata and
overaU
information
arcliitecture maturit)'in a more structured and systematic way.
4:
Top
Payoff is Aligning Unstructured with Structured Data
It is certainly valuable to analj'ze BigData on its own. However, by connectinghighdensityBig
Data to the structureddatayouare
already
collecting can bringevengreaterclarity. For example,
there is a differencein distinguisliing allsentiment from that of only your besl customers.
Whetheryouare capturingcustomer, product, equipment,or environmentalBigData, an
appropriate goalis to add more relevantdata points to yourcore master and analytical summaries
and leadyourself to better conclusions. For these reasons, many see BigData as an integral
extensionof yourexistingbusinessintelligence and datawarehousing platform.
Keep inmind that the BigData analytical processes andmodels can be humanand machine
based. Tlie BigData analytical capabilities includestatistics, spatial, semantics, interactive
discovery, and visualization. Tlieyenableyour knowledge workers and new analytical models to
correlate different tj'pes and sources of data, to make associations, and to makemeaningful
discoveries. But all in all considerBigData both a pre-processor andpost-processor of related
transactional data,and leverage your prior investments in infrastructure,platform,BI andDW.
'
McKinsey
Global Institute,
May 2011
The challenge—and opportunitj'—of 'bigdata',
https://www.mckinseyquarterly.com/The
challenge
and
opportunity
of big data
2806
2
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 23/25
An OracleWhite Paper in EnterpriseArchitecture—InformationArchitecture: An Architect sGuide to Big Data
5 : Plan Your
andbox
For
Performance
Discovering meaning in your data is not
always
straightforward. Sometimes,we don't even know
whatwe are lookingfor
initially.
That's completely expected. Managementand IT needs to
support this lack
of
direction or lack of clearrequirement. So, to accommodate the
interactiveexploration
of data and the experimentationof statistical algorithmsweneed high
performancework
areas.
Be sure that 'sandbox' environmentshavethe power theyneed andare
properly governed.
6: Align with the Cloud Operating Model
BigData processes and users require
access
to broad array of resources for both iterative
experimentation and runningproduction jobs. Data
across
thedata
realms
(transactions, master
data, reference, summarized ispart of a BigData solution. Analytical sandboxes shouldbe
created on-demandand resourcemanagementneeds to have a control of the entiredata
flow
frompre-processing, integration, in-database summarization, post-processing, and analytical
modeling.
A
well
planned private and
public
cloudprovisioning and security
strategy plays
an
integralrole in supporting tJiese changingrequirements.
Summary
BigDataishere. Analysts and research organizations have madeit
clear
thatmining machine
generated datais
essential
to future success. Embracing new technologies and
techniques
are
always
challenging but as
arcliitects
youare
expected
to provide a fast reliable pathto business
adoption.
Asyou
explore
the Vhat's new'across the spectrum of
Big
Data capabilities we
suggest
that you
thinkabout theirintegration intoyour existing infrastructure andBI investments. As
examples
align newoperational andmanagement capabilities withstandard IT,build for enterprise
scale
and
resilience
unifj'your database and developmentparadigms asyouembraceOpen Source,
and sharemetadata whereverpossiblefor both integration and
analytics.
Lastbut not least expandyour IT governance to includea BigData centerof excellence to
ensurebusinessalignment, growyour
skills
manageOpen Sourcetoolsand technolgoies, share
knowledge, establish standards, and to
manage
best practices.
Formoreinformation about
Oracle
and
Big
Data,visit
www.oracle.com/bigdata.
You can listen to Helen Sun discussthe topicsin thiswhite paper at thiswebcast. Selectsession
6,Conquering BigData. Clickhere.
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 24/25
An Oracle White Paper InEnterprise Architecture—Information Architecture; An Architect s Guide to Big Data
Enterprise
Architecture
and
Oracle
Oracle hascreated a streamlined and repeatable processto
facilitate
the development of your big
data
architecture vision
Gove rnance
us ines s
rchitecture
Enterprise
rchi tec ture
Repository
Roadmap
Future
Sta t e
rchi tecture
,
Vision
,
Current
Sta te
The
Oracle
Architecture Development Process
divides
the development of architecture into the
phaseslistedabove. OracleEnterpriseArchitects and InformationArchitects use this
methodology to propose solutions and to implementsolutions. This process leverages many
planningassetsand reference architectures to ensure ever} implementation follows Oracle s best
experiences and practices.
For
additional
white
papers
on the Oracle Architecture Development Process OADP). the
associated OracleEnterpriseArchitecture Framework OEAF). read aboutOracle s experiences
in enterprise
architecture projects,
and to participate in a
community
of enterprise
architects,
visit
the www.oracle.com/goto/EA
To understandmore about Oracle s enterprise architecture and informationarchitecture
consulting
services, please visit,
www.oracle.com/goto/EA-Services.
Watchour webcaston BigData, by clicking here.
3
7/23/2019 Big Data for Oracle Architects
http://slidepdf.com/reader/full/big-data-for-oracle-architects 25/25
O
A
i
Em
O
A
A
A
O
W
9
R
WW
P
F
t
C
CT
wMwWm
pWmm
b
O
b
XC
AO
Hw