Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Makina Corpus
October 02, 2013
Technology
91
1
Share
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
810
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
150
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
900
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
440
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
290
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
130
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
680
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
410
Tests carto avec Mocha
makinacorpus
0
830
Other Decks in Technology
See All in Technology
来期の評価で変えようと思っていること 〜AI時代に変わること・変わらないこと〜
estie
0
130
BFCacheを活用して無限スクロールのUX を改善した話
apple_yagi
0
140
FlutterでPiP再生を実装した話
s9a17
0
240
AI時代のシステム開発者の仕事_20260328
sengtor
0
320
「活動」は激変する。「ベース」は変わらない ~ 4つの軸で捉える_AI時代ソフトウェア開発マネジメント
sentokun
0
140
昔話で振り返るAWSの歩み ~S3誕生から20年、クラウドはどう進化したのか~
nrinetcom
PRO
0
120
QA組織のAI戦略とAIテスト設計システムAITASの実践
sansantech
PRO
1
300
Bill One 開発エンジニア 紹介資料
sansan33
PRO
5
18k
ThetaOS - A Mythical Machine comes Alive
aslander
0
230
開発チームとQAエンジニアの新しい協業モデル -年末調整開発チームで実践する【QAリード施策】-
kaomi_wombat
0
280
Amazon Qはアマコネで頑張っています〜 Amazon Q in Connectについて〜
yama3133
1
170
Databricks Appsで実現する社内向けAIアプリ開発の効率化
r_miura
0
220
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
432
67k
Building an army of robots
kneath
306
46k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
150
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
990
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
320
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
64
54k
Chasing Engaging Ingredients in Design
codingconduct
0
160
Measuring & Analyzing Core Web Vitals
bluesmoon
9
800
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com