becoming a document modeling guru - cdn1.marklogic.com · by michael bowers 2017-05-01 v. 5.4...
- Home
- Documents
- Becoming a Document Modeling Guru - cdn1.marklogic.com · by Michael Bowers 2017-05-01 v. 5.4 Becoming a Data Modeling Guru 2 2017 MarkLogic World [email protected]
TRANSCRIPT
23 June 2017copy COPYRIGHT MARKLOGIC CORPORATION ALL RIGHTS RESERVED
Becoming a Document Modeling Guru
Mike Bowers
by Michael Bowers 2017-05-01
v 54
Becoming a Data Modeling Guru
2
2017 MarkLogic World
mikecssDesignPatternscom
Abstractbull We know how to create great relational database models
but how do we create documentgraph models
bull How do we optimize a documentgraph model to work great in and out of MarkLogic
bull Do we have to unlearn everything relational
bull Do we need joins
bull Do we need schemas
bull What do we normalize denormalize orthogonalize and generalize
This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and
how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents
3
Why Document GraphRelational modeling was revolutionary fifty years ago
mdash We are in a new revolution mdash
Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning
Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise
semantic meaning4
About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years
bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns
bull mikecssDesignPatternscom
5
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
by Michael Bowers 2017-05-01
v 54
Becoming a Data Modeling Guru
2
2017 MarkLogic World
mikecssDesignPatternscom
Abstractbull We know how to create great relational database models
but how do we create documentgraph models
bull How do we optimize a documentgraph model to work great in and out of MarkLogic
bull Do we have to unlearn everything relational
bull Do we need joins
bull Do we need schemas
bull What do we normalize denormalize orthogonalize and generalize
This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and
how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents
3
Why Document GraphRelational modeling was revolutionary fifty years ago
mdash We are in a new revolution mdash
Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning
Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise
semantic meaning4
About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years
bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns
bull mikecssDesignPatternscom
5
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Abstractbull We know how to create great relational database models
but how do we create documentgraph models
bull How do we optimize a documentgraph model to work great in and out of MarkLogic
bull Do we have to unlearn everything relational
bull Do we need joins
bull Do we need schemas
bull What do we normalize denormalize orthogonalize and generalize
This session will liberate you from flat tables and limited relationships Youll learn why it is most natural to represent business entities as hierarchical documents why graphs are the best way to relate any business entity to any other business entity and
how MarkLogics unique indexes and query APIs make it easy to query within and join across hierarchical documents
3
Why Document GraphRelational modeling was revolutionary fifty years ago
mdash We are in a new revolution mdash
Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning
Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise
semantic meaning4
About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years
bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns
bull mikecssDesignPatternscom
5
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Why Document GraphRelational modeling was revolutionary fifty years ago
mdash We are in a new revolution mdash
Relational modeling two major flaws1 Forces you to shred business identities into multiple tables2 Limits you to a few fixed relationships with implied meaning
Document Graph modeling improves on Relational1 Enables you to model business identities as single documents 2 Frees you to connect business identities in any way with precise
semantic meaning4
About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years
bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns
bull mikecssDesignPatternscom
5
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
About the AuthorMichael Bowersbull Principal Database Architect bull Using NoSQL professionally for 9 years
bull Authorndash Pro CSS and HTML Design Patternsndash Pro HTML5 and CSS3 Design Patterns
bull mikecssDesignPatternscom
5
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Church of Jesus Christ of Latter-day Saintsbull Hundreds of MarkLogic servers running
190+ websites and applications with billions of page views annually
bull 156 million members (30016 congregations worldwide)bull Humanitarian assistance in 186 countriesbull Thousands of documents in 188 published languages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Six Data Paradigms
9
DimensionalKimball Data Warehousing
Wide ColumnFixed Dense Tables wFixed Queries No Joins
DocumentSparse Variable Data Structures
RelationalFixed Dense Tables with Flexible
Queries amp Joins
GraphUnlimited Relationships
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
6 MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
2 Oracle Exadata2 Teradata1 SQL Server4 IBM Netezza5 AWS Redshift6 SAP HANA
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle Exalytics3 SAP HANA
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
2 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
1 SQL Server2 Oracle DB2 Oracle MySQL3 SAP AS3 SAP HANA4 IBM DB24 IBM Informix7 EnterpriseDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
6 MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
5 AWS DynamoDB6 MarkLogic8 MongoDB
5 AWS DynamoDB10 Redis
KeyValueSimple
Key
9 DataStax Cassandra
Wide-ColumnComplex
Key
GraphDimensional Relational DocumentWide Column Key Value
Top Ten Databases Overall
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Do we combine multiple models
11
DimensionalKimball Data Warehousing
RelationalFixed Dense Tables with Flexible
Queries amp Joins
Wide ColumnFixed Dense Tables wFixed Queries No Joins
key value 1
hash [ key 1 value 1 key 2 value 2 ]
list value 1 list value 2
[ set value 1 set value 2 ]
Key ValuePredefined Data Structures
DocumentSparse Variable Data Structures
GraphUnlimited Relationships
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Low
Lat
ency
Ope
ratio
nal
Velo
city
High
Ban
dwid
th A
naly
tical
Volu
me
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
sPB
TB
G
B
5
00 tp
s 1
000
tps
10K
tps
1
00K
tps
Fixed structure mdash code has more dependence on DB structures code has less dependence on DB structures mdash Flexible structure
MarkLogic
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
GraphRDF
MarkLogic
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSONDocument
MarkLogic
KeyValueSimple
Key
Wide-ColumnComplex
Key
1 Multi-model NoSQL
Database
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Live AnalyticsHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
Data WarehouseHospital KeyHospital Attributes
Hospital Dimension
Surgeon KeySurgeon Attributes
Surgeon DimensionOperation KeyOperation Attributes
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes
Drug Dimension
MarkLogic
Operational
GraphDimensional Relational DocumentWide Column Key Value
Multi-Model Enterprise NoSQL Databases
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||
Minicillan | Drugs R Us | 200 | mg | ||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Power of Combining XML Document and RDF Graph
13
A Relational Model of Data for Large Shared Data BanksE F CODD IBM Research Laboratory San Jose California Information Retrieval Volume 13 Number 6 June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
+ DataNarrative + Relationships= Contextual Information
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graph or network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
(Semantic amp Structural)= Meaningful Knowledge
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
related topic
author of
published article in journal
publisher of
problemproblem
T
T
purpose
TTTsolution
solution
problem
solutionproblem T
T
Tproblem
T
solution
problem
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
WAIT A MINUTEArent JSON XML and RDF databases hierarchical and graph
Didnt EF Codd prove hierarchical and graph databases are inferior to relational
1 They break programs when the data structures hierarchy changesndash New APIs and indexes flatten out the data structure so queries work regardless of hierarchy
2 They contain redundant data throughout the hierarchy that is hard to update ndash Document DBs use foreign keys andor graphs to connect documents which are business entities
3 They are easily inconsistent because it is easy to fail to update redundant datandash Relational DBs use constraints and triggers to keep data consistent mdash this also works for NoSQL
14
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
RDF Graph and XML enable us to turn content
into meaningful knowledge
What can RDF Graph and JSON do for data
15
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Documents + RDF Graphs = NextGen Relationalbull A JSON document is a business entity
ndash A document is like a row in a table and more ndash A document contains all related tables required by a relational model to represent one business entity
bull Meaningful graph relationships connect any business entity to any other entity (or any content) in any way and in as many ways as you want at run time across all table boundaries
16
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documents invoices etc
Product manual Vendor order forms invoices etc
Customer liked product
Vendor received RMA on product
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
RDF = Standard Meaningful Graphs
17
Use Existing Ontologies for Predicate Namesbull Save time and make your data easier to
understand by leveraging existing relationship ontologies
TIP Search for ontologies at Linked Open Vocabularies (LOV)
bull Dublin Corebull FOAFbull TrackBackbull MetaVocabbull Basic Geo Vocabularybull BIObull RSS 10bull VCard RDFbull Creative Commons metadatabull WOT
bull SIOCbull GoodRelationsbull DOAPbull Programmes Ontologybull Music Ontologybull OpenGUIDbull Provenance Vocabularybull Pedagogical diagnosisbull DILIGENT Argumentation
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
New Data Structures
for Variety
and Variability
18
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
New Data Structures for Variety and Variability
19
Relational Database NoSQL Databasefixed and dense structures (mostly) sparse and variable structures (mostly)
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Each field in a row is a fixed type with the optional ability to have sparse values (ie be nullable)
Each property in a document may have a fixed type be one of several predefined types or be any type Null is a data type
Each row has a fixed dense set of fields It takes code to modify a tables schema to change any part of its fixed structure
Each document may be zero or more document types Each document may contain zero or more fixed properties and zero or more optional properties Each property may be a subdocument with the same characteristics of a document
Each table contains a sparse number of rows and requires each row to have the same fixed structure
Each collection contains a sparse number of documents and has flexible requirements for document types It may contain one fixed type of document multiple types from a fixed list multiple types from an optional list be any type or any combination of the above The same document may exist in multiple collections
Each table may have zero-to-a-few fixedrelationships to other tables Constraints are not data
Each document may have zero or more fixed relationships and zero or more optional relationships Relationships are data
Each schema must have fixed number of predefined tables and constraints before data can be loaded
Each database may contain documents without first defining structures document types relationships collections etc
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Why is relational fixed and dense
20
Surgeon ID Surgeon Name Surgeon Specialty1 Dorothy Oz Cardiothoracic2 Martin Fields Neurosurgery
Operation ID Surgeon ID Hospital ID13 1 7
Table relationships row structures and field types are fixed and dense Table types and rows are flexible and sparse mdash so we add tables when we want flexible data
Operation ID Drug ID Drug Dose Drug Dose UOM Sparse13 100 200 mg NULL13 101 NULL NULL13 100 150 mg NULL
Drug ID Drug Name100 Minocycline101 Minomycin
Fixed field types
Fixed table structure
Fixed table relationships
Each row has same structure
Fixed set of tables
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
How does NoSQL support variety and variabilityNoSQL Documents have any variety of rapidly changing structures because structure is data
21
_id 1_type operationcollections [operation transplants]operation
hospitalName Johns HopkinsoperationTypeName Heart TransplantsurgeonName Dorothy OzoperationNumber 13 administeredDrugs [
drugName Minocycline drugManufacturer Drugs R Us drugDoseSize 200 drugDoseUOM mg drugName Minomycin drugManufacturer Canada4Less sparse property drugName Minocycline drugManufacturer Drug USA drugDoseSize 150 drugDoseUOM mg
]relations
values [ subject 1 predicate opHospital object 10 hospitalAddress 1057 Mayberryhellip subject 1 predicate opType object 100 insuranceCode 21187 subject 1 predicate opSurgeon object 10000 surgeonSuccessRate 087 subject 1 predicate opDrug object 10000 drugEfficacy 08 drugRecalls 1 subject 1 predicate opDrug object 20000 drugEfficacy 05 drugRecalls 3 subject 1 predicate opDrug object 30000 drugEfficacy 07 drugRecalls 1
]
Variable Data Types
Variable Document Types
Variable Relationships
Variable and Sparse Collections
Variable Document Structures
Sparse PropertiesSparse Denormalized
Properties
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON1 No document type2 Simple easy and fast to parse No comments
No namespaces No attributes no CDATA3 Best for object-oriented computer languages4 Best for structured data (text poured into objects)5 Supports common data types structures arrays floats
strings booleans nulls
XML1 Document type 2 Namespaces for nested objects Comments Attributes
for metadata CDATA sections to embed anything3 Best for marking up natural human languages4 Best for structured text (tags added on top of text)5 Supports any data type structures sequences all
number types dates durations strings booleans null etc
heading JSON is best for nested object DATAparagraphs[
paragraph [type text value Everything is an object ]
paragraph [type text value Text exists only in objectstype text value Data exists in separate objectstype date value 2017-04-02Ztype breakvalue null type text value JSON is designed for objects tohelliptype text value be sprinkled with strings ]]
JSON is ideal for data structures
Developers work with data with maximal reliance
on predictable structures
XML is ideal for content
Developers work with tagged content with minimal reliance
on variable structure
ltsection xmlnsxsi=httpwwww3org2001XMLSchema-instancexmlnsxsd=httpwwww3org2001XMLSchemagtlt--This is a comment--gt
ltheading significance=importantgtXML is best for TEXT with lt[CDATA[ lttagsgt ]]gtltheadinggt
ltparagraph xmllang=en-usgt Marked up text has the most ltigtcomplexltigt structures ltparagraphgt
ltparagraphgtXML allows data such as this date ltdate xsitype=xsddategt2017-04-02Zltdategt to be
freely mixed into the textltbrgtXML is designed for text to be sprinkled with ltbgttagsltbgt ltparagraphgtltsectiongt
Choosing Between XML and JSON
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Simple Customer Order Relational Model
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
What would a real Customer Data Model look like
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Customer Model
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person Phones Email IDPerson IDEmail PurposeEmail AddressIs Primary
Person Emails
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
WebsitesPerson IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
PersonsPerson IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
The person ERD is 13 tables because each multivalued
property requires a separate table in relational
All of this should be represented as one JSON
document because it is one business entity
Person ERD
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
One Person JSON Doc = 13 Tables personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Phone IDPerson IDPhone PurposePhone Number
Person PhonesEmail IDPerson IDEmail PurposeEmail AddressIs Primary
Person EmailsPerson IDAddress IDAddress PurposeIs Primary
Persons Addresses
Website IDURL
Websites
Person IDWebsite IDWebsite PurposeIs Primary
Persons Websites
Company IDWebsite ID
Companies Websites
Company IDAddress ID
Companies Addresses
Company ID
Companies
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Bank Payment Methods
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Person Names
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Visa Payment Methods
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Persons Companies
Bank Payment IDPerson IDIs Primary
Persons Bank Payment Methods
Visa IDPerson IDIs Primary
Persons Visa Payment Methods
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Order
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order ItemsBank Payment IDPerson IDIs Primary
Long table Name
Address IDStreetLocalityRegionPostal CodeCountry
Short
Phone IDPerson IDPhone PurposePhone Number
Really
Person IDAddress IDAddress PurposeIs Primary
Flabbergastic
Website IDURL
For all
Person IDWebsite IDWebsite PurposeIs Primary
Details of deati
Email IDBank Payment IDStatusAccount TypeRouting NumberAccount NumberAccount HolderAccount VerifiedDrivers License NumDrivers License State
Lookup Lists
Person IDFirst NameMiddle NameLast NameMaiden NameLegal NameSorted NameNicknameInformal Letter NameFormal Letter Name
Wonderfule
Email IDVisa IDCard NumberExpiration DateName on CardCard Address IDCard Status
Testing
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityPrimary EmailCustomer StatusCustomer Join DateCustomer Reviewer Rank
Fact
Person IDCompany IDRelationship TypeRelationship StatusRelationship EffectivenessRelationship Start DateRelationship End Date
Order Items
Bank Payment IDPerson IDIs Primary
Long table Name
Person IDWebsite IDWebsite PurposeIs Primary
of deati
Website IDURL
For all
More Realistic Customer Order Model
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Same Realistic Customer Order Model
bull A JSON document is a business entity
bull Meaningful graph relationships connect business entities
30
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
JSON
Vendors
Product that was ordered
Vendor who sold the product
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Same Realistic Customer Order Model
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Relational Modeling Normalize1 Normalizebull Make each attribute
single valued ndash Create one column per
attribute
ndash Flatten all data into tables by replacing multivalued attributes with one-to-many and many-to-many joins
bull Group attributes into tables ensuring each table has one coherent context
bull Assign one primary key to each table
bull Eliminate duplicate attributes across tables
32
One-to-one Many-to-many Reference TablesOne-to-many
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
DocGraph Modeling Normalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Create one business entity or transaction per JSON document
bull JSON properties can be multi-valued (ie arrays) which means we can embed structures
bull Each embedded structure should be normalizedbull TDE can automatically populate the triple index
to denormalize data and Optic API can query it
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
DocGraph Modeling Denormalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull TDE can automatically populate the triple index with any data from any document such as a persons name and contact into
bull When triples link documents the Optic API can query triple data as if it were part of any linked document
bull This is denormalizing without denormalizing
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Relational Modeling Orthogonalize2 Orthogonalizebull Create business tables
that stand independent of all contexts
bull Create transaction tables to join together business tables
bull Create reference tables to standardize entity states and attribute characteristics
bull This maximizes data reuse by allowing tables to be combined with other tables to create any context
35
Business Tables Reference TablesTransaction Tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
DocGraph Modeling Orthogonalize
bull Everything about each entity should be in the entity
bull A business entity should exactly match how users think of it
bull A transaction should contain everything about the transaction
bull Multiple lookup tables may be combined in one doc
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Relational Modeling Generalize3 Generalizebull Make tables more
general in purpose so they can be reused in multiple contexts and are resilient to change
bull For example replace customer table with person table and subclass person into customer account rep delivery agent etc
bull Do not over generalize in relational because it hides the purpose of the model
37
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
DocGraph Modeling Generalize personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull Generalizing is easy in JSON because JSON is an object that is designed for subclassing
bull This example shows a person being subclassed as an employee customer and customer rep
bull Generalization works very well in JSON and unlike relational it does not hide the purpose of the model
bull See subclassing below
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Relational Modeling Tune4 Tunebull Tune the model to
meet the performance requirements of the application
bull Optimize sparse data out of a table and put it into one-to-one related tables
bull Denormalize data by copying commonly joined data into multiple tables to reduce the need for joins which slow performance
39
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
DocGraph Modeling Tune personId 11 schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ] triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111 triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
bull These examples are tuned for MarkLogic
bull In MarkLogic each property should have a unique name because MarkLogic automatically creates equality indexes on each property based on its name ndash regardless of where it is in the hierarchy
bull You manually create inequality indexes based on property name or hierarchical path
personId 11
schemas [ schema type person schemaVersion 10 schema type customer schemaVersion 10 schema type companyRelationships schemaVersion 10 ]
triples [ triple subject 11 predicate hasSpouse object 22 triple subject 11 predicate hasChild object 33 triple subject 11 predicate hasChild object 44 triple subject 11 predicate likesProduct object 1111
triple subject 11 predicate hasEmployer object 666 tripleStatus active tripleCreatedOn 2016-10-05 tripleEffectivityStartDate 1966-06-06 tripleEffectivityEndDate null ] person personStatus active personName Mike Bowers personOnlineName MTB1 personNameVariations personFirstName Mike personLastName Bowers middleName Thomas personMaidenName null personLegalName Michael Thomas Bowers personSortedName Bowers Mike personNickname Mikey personInformalLetterName Mike personFormalLetterName Mike Bowers personBirthDate 1981-01-01 personGender male personEthnicity caucasian personShippingPreference free
personPhones [ phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 phone phonePurpose [ work ] phoneNumber +1 (111) 111-1112 phone phonePurpose [ home ] phoneNumber +1 (111) 111-1113 ]
personAddresses [ address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ] addressLocality Clinton addressRegion UT addressPostalCode 84015 addressCountry United States ]
personEmails [ email emailPurpose [ work primary ] emailAddress mikesomeworkcom email emailPurpose [ personal ] emailAddress mikesomewherecom ]
personWebsites [ website websitePurpose [ work ] websiteUrl httpwwwmikecom website websitePurpose [ linkedin primary ] websiteUrl httpslinkedcommike ]
personPaymentMethods [ paymentMethod paymentMethodType Visa paymentMethodStatus Verified creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11 nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing paymentMethod paymentMethodType Checking paymentMethodStatus Verified bankRoutingNumber 11111111 bankAccountNumber 11111111 nameOnBankAccount Michael Bowers driversLicenseNumber 11111111 driversLicenseState UT ]
customer customerStatus active customerJoinDate 2001-01-01 customerReviewerRank 11111
companyRelationships [ company companyIdFk 555 companyName Nabisco personCompanyRelationship [ employeeOf accountRepFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null accountRepSupportedProducts [ product productIdFk 1111 productName Oreos ]
company companyIdFk 666 companyName Goliath Dairies personCompanyRelationship [ independentContractorFor deliveryPersonFor ] personCompanyStatus active personCompanyEffectiveness Effective personCompanyStartDate 2001-01-01 personCompanyEndDate null deliverySchedule 2 pm Mon-Fri performanceNotes He is always late ]
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Agenda1 Database Modeling Paradigms2 From Relational to Document Graph3 Document Advantages
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Simplicity
Each Business Entity Is a JSON Documentbull These five JSON documents equal more than 30 tables in a
relational model
bull JSON modeling is mostly about orthogonalizing data into different business entities transactions and lookup lists
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Performance
One JSON document requires one IObull Relational requires dozens of IOs for the same databull For example the person document is 45K and requires 13 tables
to represent it in a relational databasebull You have to join 13 tables to retrieve all of a persons databull Each join requires 4-to-6 IOs 3-to-4 IOs for indexes and 1-to-2 IOs for a rowbull 4 IOs x 13 joins = 52-to-78 IOs
bull MarkLogic indexes are in RAM so getting a doc is 1 IO
bull MongoDB indexes are B-tree indexes and may require 4 IOs
bull To further tune JSON we may optionally denormalize data by copying common data from other business entities mdash just like we do in relational tables
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Multi-Valued and Multi-Typed Properties
JSON documents can containmulti-valued and multi-typed properties
JSON databases can query multi-valued and multi-typed properties
using MarkLogics new Optic API
and Templated Data Extraction
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Document PROs Multi-Value Properties
Any JSON data can be multi-valuedbull This allows many-to-one and many-to-many relationships
to be captured in one JSON document
personAddresses [
address addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]addressLocality Clinton addressRegion UTaddressPostalCode 84015 addressCountry United States
]
Address IDStreetLocalityRegionPostal CodeCountry
Addresses
Person IDAddress IDAddress PurposeIs Primary
Persons Addresses
Person IDStatusShipping PreferenceNameOnline NameBirth DateGenderEthnicityCustomer StatusCustomer Join DateCustomer Reviewer Rank
Persons
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Document PROs Multi-Type Arrays
Any JSON data can be multi-typedbull Different types of records in the same array is natural to do in programming but very difficult to do in
relational databases
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus VerifiedcreditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing
paymentMethod paymentMethodType Checking paymentMethodStatus VerifiedbankRoutingNumber 11111111 bankAccountNumber 11111111nameOnBankAccount Michael BowersdriversLicenseNumber 11111111 driversLicenseState UT
]
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person
personName Mike Bowers
personBirthDate 1981-01-01
personGender male
personEthnicity caucasian
personShippingPreference free
personPhones [
phone phonePurpose [ mobile primary ] phoneNumber +1 (111) 111-1111 ]
personAddresses [
address
addressPurpose [ shipping mailing ] addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States ]
personEmails [
email emailPurpose [ work primary ] emailAddress mikesomeworkcom ]
personWebsites [
website websitePurpose [ work ] websiteUrl httpwwwmikecom ]
personPaymentMethods [
paymentMethod paymentMethodType Visa paymentMethodStatus Verified
creditCardNumber 1234123412341234 creditCardExpirationDate 2011-11-11
nameOnCreditCard MIKE BOWERS creditCardValidationAddress mailing ]
customer
customerStatus active
customerJoinDate 2001-01-01
customerReviewerRank 11111
orderId 1
order
orderNumber 111-11-111
orderDate 2001-01-01T090000Z
orderStatus Shipped
customer
customerId 11
customerName Mike Bowers
orderShipment
shipper companyIdFk 777 companyName FedEx
shipmentMethod 2nd Day Air shipmentCost 587 shipmentNotes
orderShippingAddress
addressStreet [ 99 Smith Dr ]
addressLocality Clinton addressRegion UT
addressPostalCode 84015 addressCountry United States
orderedProducts [
product
productIdFk 1111
productName Oreo Thins Sandwich Cookies
productOrderQty 3 productUnitPrice 350 productCondition New
productWeightOz 11
productOrderStatus Shipped
productShippedDate 2001-01-01+0101
seller sellerId 555 sellerName Nabisco
supplier supplierId 555 supplierName Nabisco
product
productIdFk 2222
productName Rice Dream Rice Drink - Vanilla 64 Fl Oz
productOrderQty 1 productUnitPrice 2 50 productCondition New
productId 1111
product
productCode oreo-111
productStatus active
productName Oreo Thins Sandwich Cookies
productDescription YUMMY
productCategories [ cookies chocolate cookies desserts snacks
productTags [ cookie chocolate snack treat ]
productDetailDescriptionURL httpmysalessitecomcdndetailsoreo-111
productAvailabilityDate 2001-01-01
productListPrice 450
productDiscountPrice 350
productStandardCost 250
productInventoryReorderLevel 100
productInventoryTargetLevel 1000
productVendor vendorId 555 companyName Nabisco
productSuppliers [
productSupplier supplierId 555 companyName Nabisco
standardSupplierProductPrice 250 currentSupplierProductPrice 2
supplierProductQuantityPerUnit 1 box of 35 cookies supplierProdu
productMeasures [
productMeasure
productMeasurePurpose productPackage productWeight 101
productWidth 45 productHeight 17 productLength 67
productMeasure
productMeasurePurpose shippingPackage productWeight 1
productWidth 5 productHeight 2 productLength 8
d tW b it [
companyId 555
company
companyName Nabisco
companyStatus active
companyPhones [
phone phonePurpose sales
phone phonePurpose support
phone phonePurpose shipping
companyAddresses [
address
addressPurpose [ shipping
addressLocality East Hanove
addressPostalCode 07936
companyEmails [
email emailPurpose sales
companyWebsites [
website websitePurpose home
companyPaymentMethods [
paymentMethod
paymentMethodType Che
paymentMethodAccountNumber 555
paymentMethodVerified true
supplier supplierJoinDate 2005-05
seller sellerJoinDate 2005-05
orderLookupId 111111111orderStatus [NewInvoicedShippedClosed
]productOrderStatus [
None AllocatedInvoiced Shipped On Order No Stock
]
Document PROs Everything is DataEverything is data in JSON
bull Structurebull Keysbull Valuesbull Arrays
Because structure is databull Structure can be changed by updating data
mdash just write a querybull Structure can be queried
bull For presence of keysbull For nested relationshipsbull For any combination of nesting keys values
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
personId 11
person personName Some very long name with many UTF-8 international charactershellippersonBirthDate 1982-02-02personHeightInFeet 575personPhones [ phone phonePurpose [ mobile ] phoneNumber +1 (222) 222-2222
phone phonePurpose [ home ] phoneNumber +1 (222) 222-2224hidePhoneNumber true
]
Document PROs Simple Easy Data Typesndash Attribute names have unlimited length and can contain any charactersndash Strings have unlimited size and can contain any internationalized UTF-8 charactersndash Numbers contain integers or decimalsndash Booleans are true or falsendash Arrays can have values of any type and can mix typesndash Objects have unlimited number of attributes can be nested and can be sparsely populated
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |
Document PROs Meaningful Data Modeling
49
A Relational Model of Data for Large Shared Data Banks
E F CODD
IBM Research Laboratory San Jose California
Information Retrieval Volume 13 Number 6
June 1970Programs should remain unaffected when the internal representation of data is changed hellipTree-structuredhellipinadequacieshellipare discussed hellipRelationshellipare discussed and applied to the problems of redundancy and consistencyhellip KEY WORDS AND PHRASES data base data structure data organization hierarchies of data networks of data relationsCR CATEGORIES 370 373 375 420 422
1 Relational Model and Normal Form 11 INTRODUCTION This paper is concerned with the application of elementary relation theoryhelliptohellipformatted data hellipThe problemshellipare those of data independencehellipandhellipdata inconsistencyhellipThe relational viewhellipappears to be superior in several respects to the graphor network modelhellip hellipRelational viewhellipforms a sound basis for treating derivability redundancy and consistencyhellip [and] a clearer evaluationhellipof
12 DATA DEPENDENCIES IN PRESENT SYSTEMS hellipTableshelliprepresent a major advance toward the goal of data independencehellip
121 Ordering Dependence hellipPrograms which take advantage of the stored ordering of a file are likely to failhellipifhellipit becomes necessary to replace that ordering by a different one
122 Indexing Dependence hellipCan application programshellipremain invariant as indices come and go hellip
123 Access Path Dependence Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data hellipThese programs fail when a change in structure becomes necessary Thehellipprogramhellipis required to exploithellippaths to the data hellipPrograms become dependent on the continued existence of thehellippaths
T
PO L L
EP
T
TT
TR R R R R
T Topic
P Person
L Location
P Publication
R Reference
O Organization
E Event
geo-located in geo-located in
printed on
author of
published article in journal
publisher of
problem
problemT
T
purpose
T
T
Tsolution
solution
problem
solutionT
T
Tproblem
T
solution
problem
Customer Order
JSON
Customers
JSON
Orders
JSON
Products
XML Hospital Name John Hopkins Operation Number 13 Operation Type Heart Transplant Surgeon Name Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less
Drugs 400 mg
Minicillan Drug USA 150 mg
Documentation
JSON
Vendors
Product that was ordered
Vendor who sold the product
Shipping instructions etc
Customer invoices annual reports etc
Customer order documentsProduct manual
Customer liked product
Vendor received RMA on product
Inside and Out
- Becoming a Document Modeling Guru
- Becoming a Data Modeling Guru
- Abstract
- Why Document Graph
- About the Author
- Church of Jesus Christ of Latter-day Saints
- Slide Number 7
- Slide Number 8
- Six Data Paradigms
- Top Ten Databases Overall
- Do we combine multiple models
- Multi-Model Enterprise NoSQL Databases
- Slide Number 13
- WAIT A MINUTE
- RDF Graph and XML enable us to turn content into meaningful knowledgeWhat can RDF Graph and JSON do for data
- Documents + RDF Graphs = NextGen Relational
- RDF = Standard Meaningful Graphs
- New Data Structures for Variety and Variability
- New Data Structures for Variety and Variability
- Why is relational fixed and dense
- How does NoSQL support variety and variability
- Choosing Between XML and JSON
- Slide Number 23
- Simple Customer Order Relational Model
- What would a real Customer Data Model look like
- Customer Model
- Person ERD
- One Person JSON Doc = 13 Tables
- Slide Number 29
- Same Realistic Customer Order Model
- Slide Number 31
- Relational Modeling Normalize
- DocGraph Modeling Normalize
- DocGraph Modeling Denormalize
- Relational Modeling Orthogonalize
- Slide Number 36
- Relational Modeling Generalize
- DocGraph Modeling Generalize
- Relational Modeling Tune
- DocGraph Modeling Tune
- Slide Number 41
- Slide Number 42
- Slide Number 43
- Slide Number 44
- Slide Number 45
- Slide Number 46
- Slide Number 47
- Document PROs Simple Easy Data Types
- Slide Number 49
-
Drug Name | Drug Manufacturer | Dose Size | Dose UOM | ||||||
Minicillan | Drugs R Us | 200 | mg | ||||||
Maxicillan | Canada4Less Drugs | 400 | mg | ||||||
Minicillan | Drug USA | 150 | mg |
Hospital Name | John Hopkins | ||
Operation Number | 13 | ||
Operation Type | Heart Transplant | ||
Surgeon Name | Dorothy Oz |