Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Makina Corpus
October 02, 2013
Technology
1
71
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
Tweet
Share
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
690
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
110
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
820
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
410
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
260
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
93
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
640
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
370
Tests carto avec Mocha
makinacorpus
0
800
Other Decks in Technology
See All in Technology
Why does continuous profiling matter to developers? #appdevelopercon
salaboy
0
190
Adopting Jetpack Compose in Your Existing Project - GDG DevFest Bangkok 2024
akexorcist
0
110
10XにおけるData Contractの導入について: Data Contract事例共有会
10xinc
6
620
初心者向けAWS Securityの勉強会mini Security-JAWSを9ヶ月ぐらい実施してきての近況
cmusudakeisuke
0
120
【令和最新版】AWS Direct Connectと愉快なGWたちのおさらい
minorun365
PRO
5
750
The Role of Developer Relations in AI Product Success.
giftojabu1
0
120
安心してください、日本語使えますよ―Ubuntu日本語Remix提供休止に寄せて― 2024-11-17
nobutomurata
1
990
DMARC 対応の話 - MIXI CTO オフィスアワー #04
bbqallstars
1
160
Why App Signing Matters for Your Android Apps - Android Bangkok Conference 2024
akexorcist
0
130
CysharpのOSS群から見るModern C#の現在地
neuecc
2
3.2k
EventHub Startup CTO of the year 2024 ピッチ資料
eventhub
0
110
リンクアンドモチベーション ソフトウェアエンジニア向け紹介資料 / Introduction to Link and Motivation for Software Engineers
lmi
4
300k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
93
16k
Building Your Own Lightsaber
phodgson
103
6.1k
Music & Morning Musume
bryan
46
6.2k
Ruby is Unlike a Banana
tanoku
97
11k
YesSQL, Process and Tooling at Scale
rocio
169
14k
4 Signs Your Business is Dying
shpigford
180
21k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
How to Think Like a Performance Engineer
csswizardry
20
1.1k
Code Review Best Practice
trishagee
64
17k
10 Git Anti Patterns You Should be Aware of
lemiorhan
654
59k
Designing for humans not robots
tammielis
250
25k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
506
140k
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com