Importing Wikipedia in Plone

Slide 1

Slide 1 text

Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013

Slide 2

Slide 2 text

ZODB is good at storing objects ● Plone contents are objects, ● we store them in the ZODB, ● everything is fine, end of the story.

Slide 3

Slide 3 text

But what if ... ... we want to store non-contentish records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.

Slide 4

Slide 4 text

Store them as contents anyway That is a powerfull solution. But there are 2 major problems...

Slide 5

Slide 5 text

Problem 1: You need to manage a secondary system ● you need to deploy it, ● you need to backup it, ● you need to secure it, ● etc.

Slide 6

Slide 6 text

Problem 2: I hate SQL No explanation here.

Slide 7

Slide 7 text

I think I just cannot digest it...

Slide 8

Slide 8 text

How to store many records in the ZODB? ● Is the ZODB strong enough? ● Is the ZCatalog strong enough?

Slide 9

Slide 9 text

My grandmother often told me "If you want to become stronger, you have to eat your soup."

Slide 10

Slide 10 text

Where do we find a good soup for Plone? In a super souper!!!

Slide 11

Slide 11 text

souper.plone and souper ● It provides both storage and indexing. ● Record can store any persistent pickable data. ● Created by BlueDynamics. ● Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.

Slide 12

Slide 12 text

Add a record >>> soup = get_soup('mysoup', context) >>> record = Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)

Slide 13

Slide 13 text

Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] = '6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'

Slide 14

Slide 14 text

Access record >>> from souper.soup import get_soup >>> soup = get_soup('mysoup', context) >>> record = soup.get(record_id)

Slide 15

Slide 15 text

Query >>> from repoze.catalog.query import Eq, Contains >>> [r for r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] []

Slide 16

Slide 16 text

souper ● a Soup-container can be moved to a specific ZODB mount- point, ● it can be shared across multiple independent Plone instances, ● souper works on Plone and Pyramid.

Slide 17

Slide 17 text

Plomino & souper ● we use Plomino to build non-content oriented apps easily, ● we use souper to store huge amount of application data.

Slide 18

Slide 18 text

Plomino data storage Originally, documents (=record) were ATFolder. Capacity about 30 000.

Slide 19

Slide 19 text

Plomino data storage Since 1.14, documents are pure CMF. Capacity about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.

Slide 20

Slide 20 text

Plomino & souper With souper, documents are just soup records. Capacity: several millions.

Slide 21

Slide 21 text

Typical use case ● Store 500 000 addresses, ● Be able to query them in full text and display the result on a map. Demo

Slide 22

Slide 22 text

What is the limit? Can we import Wikipedia in souper? Demo with 400 000 records Demo with 5,5 millions of records

Slide 23

Slide 23 text

Conclusion ● Usage performances are good, ● Plone performances are not impacted. Use it!

Slide 24

Slide 24 text

Thoughts ● What about a REST API on top of it? ● Massive import is long and difficult, could it be improved?

Slide 25

Slide 25 text

Makina Corpus For all questions related to this talk, please contact Éric Bréhault [email protected] Tel : +33 534 566 958 www.makina-corpus.com