big data modeling
TRANSCRIPT
BIG DATA MODELING
Hans Hultgren
RMDC Fall 2016
Welcome
• Big Data1
• Data Modeling2
• Big Data Modeling3
AGENDA
Session Objectives
• Big Data Fundamentals– Components of Big Data– Structure & Schemas– Tools & Architecture
• Data Modeling – Integration & History– Data Warehousing & BI– Conceptual to Physical
• Big Data Modeling– Focus on Meaning
• Ensemble Modeling– The Blended Architecture
BIG DATA
Big Data
“Huge” Data Volumes
n-Structured & Very Complex
Streaming & Shape-Shifting
Typical Data
v v
v v
v v
v v
Typical Data Big Data
A
B
C
Big Data
• VolumeHuge Volumes of Data
• VelocityDrinking from a Fire Hose
• Varietyn-Structured Data
• VeracityQuality, Accuracy, Reliability, Trustworthiness
• ValueBusiness Value and Value Potential
Big Data Architecture
• To deal with the features of Big Data, supporting architectural components are based on:
–Data distribution, and
– Late Binding of Schemas
KVP
Modeling and Understanding
• Schema on Write
• Schema on Read
• Dismantled Schema on Write
• Schema on Focus
• Schema on Leverage
9
LOAD
MODEL APPLYEXPLORE
Modeling and Understanding
• Big Data
Possibilities
10
LOADMODEL APPLY
EXPLORE
Inconvenient Truth about BIG DATA
http://community.embarcadero.com/blogs/entry/the-hidden-elephant-in-big-data-modeling
DATA MODELING
Data Modeling
Mans Search for Meaning…
• Conceptual Modeling
• Logical Modeling
• Information Modeling
• Physical Data Modeling
Ensemble Modeling™
14
All the parts of a thing taken together, so that
each part is considered only in relation to the whole.
• The constellation of component parts acts as a whole.
• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts. An Ensemble is typically based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept.
EMF
Forms of Modeling & Ensemble
15
Ensemble
Anchor Focal Point Data Vault
DV2.02G
Hyper AgilityTemporal6NF, etc.
Matter
EDW
DataMart
DataMart
DataMart
ERP
Acctg
Sales
3NF Dimensional
E M F
The Data Vault Ensemble
16
• The Data Vault Ensemble conforms to a single key – embodied in the Hub construct.
• The component parts for the Data Vault Ensemble include:
– Hub The Natural Business Key
– Link The Natural Business Relationships
– Satellite All Context, Descriptive Data and History
Ensemble means thinking differently
17
Customer
Customer
• The minimal construct then for an “entity”
such as “Customer” is now (in data vault) a
Hub with a set of Satellites
Applying data vault modeling pattern
18
Data Vault Ensemble Modeling Process
1) Identify and Model the Core Business Concepts
• Business Interviews is at the heart of this step
What do you do? What are the main things you work with?
• Find best/target Natural Business Key19
Data Vault Ensemble Modeling Process
2) Identify and Model the Natural Business Relationships
• Specific Unique Relationships
• Be considerate of the Unit of Work and Grain
20
Data Vault Ensemble Modeling Process
3) Analyze and Design the Context Satellites
• Consider Rate of Change, Type of Data and also the Sources
21
BIG DATA
MODELING
Logical business model
• Leveraged for all logical
model needs including
the data warehouse, big
data lake, master data
management (MDM) and
operational integration
initiatives
• Closely aligned to DV
physical model
Ensemble Logical Form ( )
23
Customer
Region Store
Sale
Vendor
Product
Sale LI
Employee
Customer
RegionStore
Sale
Vendor
Product
Sale LI
Employee
CustomerRegion
Store
Sale
Vendor
Product
Sale LI
Employee
Ensemble Logical Form
24
CustomerRegion
Store
Sale
Vendor
Product
Sale LI
Employee
ELF Modeling maintained in:
* Metadata
* Logical Data Model
* Data Modeling Tools
* Virtual Schemas
* Other Tools or Artifacts
Map to Context Data stored in:
* JSON Docs
* XML (w/ XSD or Not)
* Blobs (Free Form Text)
* Big Data Platforms
* Hadoop
* In the Cloud
Three Paths for Modeling
Structured / Known
• CBC
• NBR
• Attribution
• Columns
Results in a backbone model with attributes in defined columns
N-Structured / NVP
• CBC
• NBR
• Attribution
Results in a backbone modes with known/expected attribute names/tags
N-Structured / KVP
• CBC
• NBR
Results in a backbone model with capacity to capture unknown attribution either named/tagged or not
APPLYING THE ENSEMBLE
Integration
across
Platforms
Expanded Applications
CustomerRegion
Store
Sale
Vendor
Product
Sale LI
Employee
Summary
Ensemble in the Big Data World
• Conceptual Modeling
• Logical Modeling
• Information Modeling
• Physical Data Modeling
• Integration Platform
+++-+ + +
Links and Information
CDVDM Training & Certification
www.GeneseeAcademy.com
gohansgo
HansHultgren.WordPress.com
HansHultgren
Online, On-Demand Video Lessons
DataVaultAcademy.com
DataVaultAcademy
29
e-Book: Book:Modeling the Agile Data Warehouse with Data Vault Modeling the Agile Data Warehouse with Data Vault