managing "big data" application complexity with cloudgraph
DESCRIPTION
Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexityTRANSCRIPT
![Page 1: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/1.jpg)
Managing “Big Data” Application Complexity using CloudGraph®
Scott Cinnamond, TerraMeta Software Inc.http://cloudgraph.org
-Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-
![Page 2: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/2.jpg)
Complexity Increases With Added Data Model Entities
Com
plex
ity(f
or c
olum
nar
data
sto
re c
lient
app
licat
ions
)
#Model Entities / Classes
![Page 3: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/3.jpg)
Why More App Complexity? (with Added Data Model
Entities)
1. Column Mapping Difficult
2. Composite Row Key Mapping, Hashing, Salting and Formatting
3. Persistence Code Development, Refactoring and Maintenance
![Page 4: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/4.jpg)
Typical Column Mapping Strategies
• Hard Coded Names Embedded in Source Code– Not good
• Column Names in Java Constants File(s)– Better, but still really hard coded– Feasible with 5-10 entities, 50 attributes– With 500-1000 entities and 5000+ attributes? Not maintainable
• Custom XML Configuration– Create a “meta model” using, say XML Schema and JAXB– Construct unique names and refer to them in source – Better but application specific ”one off”– Does not solve “state” management challenges
![Page 5: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/5.jpg)
CloudGraph Column Mapping A Standards Based Approach Using SDO and UML
UM
L Nam
e “A
liase
s”
SDO Metadata “Repository”
Data Graph “State”
CloudGraph Statefull Column
Key Factories
Logical Nam
es
(readable)
Physical Names
(terse)
Business Nam
es
Java
byte
[] as
sess
ors
Cachin
g
Object
Poolin
g
Seq
uenc
e M
anag
emen
t
Ent
ity ID
M
appi
ng
Row
Key
M
appi
ng
Mar
shal
ling
![Page 6: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/6.jpg)
Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source
Code?Create | Update | Delete: Read (Query):
CloudGraph SDO API(Service Data Objects)
CloudGraph Query DSL(Domain Specific
Language)
![Page 7: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/7.jpg)
CloudGraph SDO Your complex domain model as a
(create | update | delete) API• Drives all Column Mapping Transparently• Granular Control over Data Graph Edits• Convenient “Create Entity” Factory Methods• Change Tracking Including History• Rich Built In Data Types • 100% Compile Time Checking• Supports Multiple Inheritance Models• Currently Uses PlasmaSDO™
– See http://plasma-sdo.org
![Page 8: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/8.jpg)
CloudGraph SDO API ExampleUses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
![Page 9: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/9.jpg)
CloudGraph Query DSLYour complex domain model as a query API
• Drives all Column Mapping Transparently• Intuitive Almost “Fluent” English Appearance• Logical Entity, Attribute Names Generated
into API• 100% Compile Time Checking• Currently Uses PlasmaQuery®
– See http://plasma-query.org
![Page 10: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/10.jpg)
CloudGraph Query DSL ExampleUses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
![Page 11: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/11.jpg)
• More Model Entities:Larger data graphsMore composite row key fields so can find graphsHow to reliably map “deep” into graphs
• Row Key Field Hashing and Formatting– Critical for HBase partial-key scan API– Many data type specific idiosyncrasies
Why More Complexity? 2.) Composite Row Key Mapping,
Hashing and Formatting
![Page 12: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/12.jpg)
CloudGraph HBase Composite Row KeysA Configuration Driven Approach using SDO XPath
C
onfigura
tion
SDO XPath
Scan Support
CloudGraph Composite Row
Keys
Hashing
Formatting
Delimiters Exp
ress
ions
Field
Map
ping
Deep
Graph
Trav
ersa
lP
artia
l Key
A
ssem
bly
Fuz
zy R
ow
Filt
er
Hie
rarc
hica
l Row
Filt
ers
Field Ordering
![Page 13: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/13.jpg)
Why More Complexity? 3.) Persistence Code Development,
Refactoring and Maintenance
*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xml-cml.org
Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines
![Page 14: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/14.jpg)
1. Leverage Existing or Create UML Model(s)1. Can be automatically reverse engineered from
existing RDBMS Schema
2. Map Repository Namespaces to Service Configurations
3. Define and Map Row Keys To Data Graphs4. Add CloudGraph and Plasma Maven
Artifacts and Generate Code
CloudGraph Code GenerationA contract-first approach in 4 steps
![Page 15: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/15.jpg)
Resources
• Exchange Model Examples– https://github.com/cloudgraph/cml– https://github.com/cloudgraph/bioxsd– https://github.com/cloudgraph/hl7
• End To End Examples– https://github.com/cloudgraph/wordnet– http://wordnet.cloudgraph.org
![Page 16: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/16.jpg)
• Project Status– CloudGraph® is currently in private beta testing– Other services for Cassandra, MongoDB and others are under
analysis– See http://cloudgraph.org for contact info and other details
• Licensing– CloudGraph® 0.5.5 Community Edition (CE) is open source
licensed under version 2 of the GNU General Public License• Trademarks
– CloudGraph® is a registered trademark of TerraMeta Software LLC– Java™ is a trademark of Oracle Corporation– HBase™ is a trademark of Apache Software Foundation
Status/Legal
Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved
![Page 17: Managing "Big Data" Application Complexity with CloudGraph](https://reader033.vdocument.in/reader033/viewer/2022061223/54c674ec4a795913618b4709/html5/thumbnails/17.jpg)
• BIOXSD – http://bioxsd.org• Chemical Markup Language (CML) –
http://xml-cml.org• Health Level 7 (HL7) – http://hl7.org• Apache HBase™ – http://hbase.apache.org• Apache Cassandra – http://cassandra.apache.org• MongoDB - http://www.mongodb.org• PlasmaSDO™ – http://plasma-sdo.org,
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22plasma-sdo%22
References