learning lessons: building a cms on top of nosql technologies
TRANSCRIPT
![Page 1: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/1.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Learning LessonsBuilding a content repositoryon top of NoSQL Technologies
![Page 2: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/2.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
hello,I’m @stevenn from @outerthought
![Page 3: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/3.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
This story is about
![Page 4: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/4.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Complexity
4
complexity
age
1.0
2.0
3.0
software architecture
![Page 5: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/5.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Complexity
5
complexity
age
1.0
2.0
3.0
user interest
![Page 6: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/6.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
We Prefer Sophistication
6
» the challenge for us was to scale ...without dropping features
![Page 7: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/7.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
7
database (+opt. filesystem) (+ opt. full-text indexes)
![Page 8: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/8.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
8
application
database (+opt. filesystem) (+ opt. full-text indexes)
cache
![Page 9: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/9.jpg)
cacheapplication
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
9
more cache
database (+opt. filesystem) (+ opt. full-text indexes)
![Page 10: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/10.jpg)
application cache
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
10
even more cache
more cache
database (+opt. filesystem) (+ opt. full-text indexes)
![Page 11: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/11.jpg)
application cache
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
11
client
even more cache
more cache
database (+opt. filesystem) (+ opt. full-text indexes)
![Page 12: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/12.jpg)
application cache
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The typical CMS ‘architecture’
12
client (+cache)
even more cache
more cache
database (+opt. filesystem) (+ opt. full-text indexes)
![Page 13: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/13.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
What we found hard to scale
» access control
» facet browsing
» all the nifty stuff people were using our software for
» ... anything that required random accessto in-memory-cache data for computations
13
![Page 14: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/14.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Beyond the ‘scaling’ problem
» three-prong data layer
» result set merging (between MySQL & Lucene)» happened in appcode/memory
» ‘transactions’, set operations = hard
14
fs
![Page 15: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/15.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Beyond the three-prong problem
» errrr..... “Failover” ..... ?
15
![Page 16: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/16.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
If we would be able to add more nodes ...
»True Distribution
16
scalability
availability
performance
... in the line of fire
![Page 17: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/17.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Solution 1
» do MORE inside the database
17
![Page 18: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/18.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Functional
18
![Page 19: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/19.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Functional
19
![Page 20: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/20.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Infrastructural
2020
more database !
![Page 21: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/21.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
even more database !
![Page 22: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/22.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
let’s add message busses !
![Page 23: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/23.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
RMI! JMS over JDBC! stuff!
w00t !
![Page 24: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/24.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
http://bigdatamatters.com/bigdatamatters/2010/04/high-availability-with-oracle.html
![Page 25: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/25.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Business Development 101
25
budget
user interest
![Page 26: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/26.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Solution II
26
sophistication
nosql?
1.0
2.0
3.0
ability to cope
mysql
![Page 27: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/27.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Enter The Cambrian Explosion
27
NoSQL
Cassandra
neo4j
![Page 28: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/28.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Requirements, phase I
28
» automatic scaling to large data sets
» fault-tolerance: replication, automatic handling of failing nodes
» a flexible data model supporting sparse data
» runs on commodity hardware
» efficient random access to data
» open source, ability to participate in the development thus drive the direction of the project
» some preference for a Java-based solution
![Page 29: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/29.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Requirements, phase II
»After careful consideration, we realized the important choices were also:
» consistency: no chance of having two conflicting versions of a row
» atomic updates of a single row, single-row transactions
» bonus points for MapReduce integration» e.g. full-text index rebuilding
29
![Page 30: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/30.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
That brought us to HBase, which bought us:
» a datamodel where you can have column families which keep all versions and others which do not, which fits very well on our CMS document model
» ordered tables with the ability to do range scans on them, which allows to build scalable indexes on top of it
» HDFS, a convenient place to store large blobs
» Apache license and community, a familiar environment for us
30
![Page 31: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/31.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
»OK, so now we had a data store !
![Page 32: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/32.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
»However, content repository =store + search
ouch!
![Page 33: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/33.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
That was
easy !
(however ...)
![Page 34: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/34.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
Search ponderings
»CMS = two types of search
» structured search» numbers, strings» based on logic (SQL, anyone?)
» information retrieval (or: full-text search)» text» based on statistics
![Page 35: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/35.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Search ponderings
»All of that, at scale
35
![Page 36: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/36.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Structured Search
»HBase Indexing Library
» idea from Google App Engine datastore indexes
» http://code.google.com/appengine/articles/index_building.html
36
rowkey
A
B
col
val3
val2
col
foo6
foo7
content table index table A
rowkey
val2-B
val3-A
col
order
![Page 37: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/37.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Full-text / IR search
» Lucene?
» no sharding (for scale)
» no replication (for availability)
» batched index updates (not real-time)
37
![Page 38: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/38.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Beyond Lucene» Katta
» scalable architecture, however only search, no indexing
» Elastic Search
» very young (sorry)
» hbasene et al.
» stores inverted index in HBase, might not scale all features
» SOLR
» widely used, schema, facets, query syntax, cloud branch
More info: http://lilycms.org/lily/prerelease/technology.html
38
![Page 39: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/39.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
+?
=Easy ! O
r ?
![Page 40: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/40.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
Remember distribution ?Remember secondary indexes ?
➙ Need for reliable queuing
![Page 41: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/41.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
Connecting things
»we needed a reliable bridge between our main storage (HBase) and our index/search server(s) (SOLR)
» indexing, reindexing, mass reindexing (M/R)
»we need a reliable method of updating HBase secondary indexes
» all of that eventually to run distributed
» distribution means coping with failure
![Page 42: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/42.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Solution
»ACMEMessageQueue ? Bzzzzzt.We wanted fault-safe HBase persistence for the queues.Also for ease of administration.
»➙ WAL & Queue implemented on top of HBase tables
42
![Page 43: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/43.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
WAL / Queue
» WAL» guaranteed execution
of synchronous actions
» call doesn’t return before secondary action finishes
» e.g. update secondary actions
» if all goes well, size = #concurrent ops
» will be useful/made available outside of Lily context as well!
» Queue» triggering of async
actions
» e.g. (re)index (updated) record with SOLR back-end
» size depends on speed of back-end process
43
![Page 44: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/44.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The Sum» Lily model (records & fields)
» mapped onto HBase (=storage)
» indexed and searchable through SOLR
» using a WAL/Queue mechanismimplemented in HBase
» runtime based on Kauri
» with client/server comms via Avro
44
![Page 45: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/45.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
Architecture
![Page 46: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/46.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
Architecture
![Page 47: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/47.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Roadmap
»Today = release of learning material (architecture, model, API, Javadoc)➥ www.lilycms.org➥ bit.ly/lilyprerelease
»Mid July = ‘proof of architecture’ release
» from there on, ca. 3-monthly releasesleading up to Lily 1.0
47
Nearly there!
![Page 48: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/48.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
bit.ly/lilyprerelease
![Page 49: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/49.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
License
»Apache
49
![Page 50: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/50.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Business model
»Consulting, mentoring, turn-key projects
» Strong focus on partner relations
» targeting vertical markets
» geographic coverage
» SaaS offerings
»Markets: media, finance, insurance, govt, heritage ... LOTS of semi-structured data
»Not: OLAP
50
![Page 51: Learning Lessons: Building a CMS on top of NoSQL technologies](https://reader034.vdocument.in/reader034/viewer/2022042707/58f2fc991a28ab73458b45a1/html5/thumbnails/51.jpg)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
More ?
» @outerthought
»www.lilycms.org/lily/prerelease.html
51