importing wikipedia in plone
TRANSCRIPT
![Page 1: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/1.jpg)
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
![Page 2: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/2.jpg)
ZODB is good at storing objects
● Plone contents are objects,● we store them in the ZODB,● everything is fine, end of the story.
![Page 3: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/3.jpg)
But what if ...
... we want to store non-contentish records?
Like polls, statistics, mail-list subscribers, etc.,or any business-specific structured data.
![Page 4: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/4.jpg)
Store them as contents anyway
That is a powerfull solution.
But there are 2 major problems...
![Page 5: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/5.jpg)
Problem 1: You need to manage a secondary system
● you need to deploy it,● you need to backup it,● you need to secure it,● etc.
![Page 6: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/6.jpg)
Problem 2: I hate SQL
No explanation here.
![Page 7: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/7.jpg)
I think I just cannot digest it...
![Page 8: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/8.jpg)
How to store many records in the ZODB?
● Is the ZODB strong enough?● Is the ZCatalog strong enough?
![Page 9: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/9.jpg)
My grandmother often told me
"If you want to become stronger, you have to eat your soup."
![Page 10: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/10.jpg)
Where do we find a good soup for Plone?
In a super souper!!!
![Page 11: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/11.jpg)
souper.plone and souper
● It provides both storage and indexing.● Record can store any persistent pickable data.● Created by BlueDynamics.● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
![Page 12: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/12.jpg)
Add a record
>>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
![Page 13: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/13.jpg)
Record in record
>>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
![Page 14: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/14.jpg)
Access record
>>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)
![Page 15: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/15.jpg)
Query
>>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>]
or using CQE format
>>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
![Page 16: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/16.jpg)
souper
● a Soup-container can be moved to a specific ZODB mount-point,
● it can be shared across multiple independent Plone instances,● souper works on Plone and Pyramid.
![Page 17: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/17.jpg)
Plomino & souper
● we use Plomino to build non-content oriented apps easily,● we use souper to store huge amount of application data.
![Page 18: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/18.jpg)
Plomino data storage
Originally, documents (=record) were ATFolder.
Capacity about 30 000.
![Page 19: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/19.jpg)
Plomino data storage
Since 1.14, documents are pure CMF.
Capacity about 100 000.
Usally the Plomino ZCatalog contains a lot of indexes.
![Page 20: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/20.jpg)
Plomino & souper
With souper, documents are just soup records.
Capacity: several millions.
![Page 21: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/21.jpg)
Typical use case
● Store 500 000 addresses,● Be able to query them in full text and display the result on a map.
Demo
![Page 22: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/22.jpg)
What is the limit?
Can we import Wikipedia in souper?
Demo with 400 000 recordsDemo with 5,5 millions of records
![Page 23: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/23.jpg)
Conclusion
● Usage performances are good,● Plone performances are not impacted.
Use it!
![Page 24: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/24.jpg)
Thoughts
● What about a REST API on top of it?● Massive import is long and difficult, could it be improved?
![Page 25: Importing wikipedia in Plone](https://reader033.vdocument.in/reader033/viewer/2022060205/55a1d6fe1a28ab00328b47d3/html5/thumbnails/25.jpg)
Makina Corpus
For all questions related to this talk,
please contact Éric Bréhault
Tel : +33 534 566 958
www.makina-corpus.com